Vanilla 1.1.1 is a product of Lussumo. More Information: Documentation, Community Support.
Whenever I declared a double quote character for a char type in Java, the syntax colorizing gets messed.
For example after the following line:
char c = '"';
everything is colored as a String. I think the reason is that it can not find any closing double quote.
Here is the screenshot.
I’ve tried to forget java and it seems to be working. Are single quotes reserved for char data?
Anyway to fix this I had to add another pattern/variable to the grammar repository.
Change: Line 284
string-quoted-double: {
begin: /"/
end: /"/
name: 'string.quoted.double.java'
swallow: '\\.'
}
To:
string-quoted-double: {
begin: /"/
end: /"/
name: 'string.quoted.double.java'
patterns: [ { match: /\\./ } ]
}
string-quoted-single: {
begin: /'/
end: /'/
name: 'string.quoted.single.java'
patterns: [ { match: /\\./ } ]
}
also add this:
{
include: '#string-quoted-single'
}
after line 223.
If single quotes are reserved for char data it would be better to give the repository variable a more descriptive name.
So THAT’s what I didn’t do right….. I thought I could have two includes in a single {} section…such as:
{
include: '#string-quoted-double'
include: '#string-quoted-single'
}
...but, obviously, that doesn’t work. :P
However, unless I’m wrong (which I might be), isn’t the single quote for (single) chars only ? If so, not only is this incorrect in Java, but C/C++ doesn’t handle it properly either. Since you’ve been working with the REGEX a bit more…maybe you’d know how to modify that to work? I’m thinking something more like the following:
string-quoted-single: {
match: /'.'/
name: 'string.quoted.single.java'
}
So, if I’m correct on the single quote thing, then this is the correct solution.
...I took a look at how C/C++ handles this. It completely confuses me. When I was programming in C++, only single characters were allowed…oh, wait, it’s allowing for hex characters and such — therefore, mine actually WON’T work as expected…but there’s still a bug in the REGEX since I can simply output an entire sentence within single quotes (under C/C++). I’ll keep playing with it, but a C/C++ expert might be helpful to let us know what is, and is not allowed.
...ok, the REGEX is too confusing. I’m lost.
Regex in question:
string_escaped_char – line 383, C.itGrammar
It matches octal, hex, and escaped chars like \n inside single quotes.
regex: /\\ ( \\|[abefnprtv’”?]|[0-3]\d{,2}|[4-7]\d?|x[a-fA-F0-9]{,2})/
matches: \ | \|escaped chars|octal|octal|hex
examples: \ | \|\n|\000 (max two digits after 0-3)|\40 (zero or one digit after)|\x93 (max two hex chars after x)
If is supposed to be this way in java. Then you would need to adapt the fix above to something like the C grammar.
I have updated the java.itGrammar file with the fix.
This has it the C way.
tstrokes, I was pretty sure it matched special reserved characters, such as newline, carriage return, tab, beep, etc… The problem is that it doesn’t actually work as it’s intended to under Intype’s REGEX engine. It matched a junk character string such as ‘sggfdsfdsafd’ ... I tested this under the C++ grammar just to be sure.
What scope does it show for the junk character string?
If you have an escaped char in the string does it have the correct scope?
tstrokes: Oh come on, can’t you try it too? :P
Change to C++ mode. Type the following code:
char someChar = 'fdsafdsfsd';
The scope within the junk character string is, as expected: string.quoted.single.c | source.c++ where the pipe denotes a newline (I think that differentiates parents/children)
What it should do is to leave that scope after the first identified/matched single character reference.
Yeah, I did try in all three grammars java(my version), C, and C++.
What I should have asked was what it should do which you kindly explained.
Thanks.
tstrokes: Yeah, I did try in all three grammars java(my version), C, and C++.
What I should have asked was what it should do which you kindly explained.
Thanks.
Heh…sorry, and…no problem! :P
In Java, character literals can be:
Here’s my regex, it works for all four cases, and marks invalids:
string-quoted-single: {
begin: /'/
end: /'/
name: 'string.quoted.single.java'
patterns: [
{
match: /(?<=')\\([bfnrt"'\\]|u[0-9a-fA-F]{4}|[0-7][0-7]?|[0-3][0-7]{2})(?='\s*;)/
name: 'constant.character.escape'
}
{
match: /(?<!'\\u)(.{2,}|\\)(?='\s*;)/
name: 'invalid.illegal'
}
]
}
Edit: Made the escape sequence regex more specific, and added one more subpattern for single slash (invalid).
Edit: Okay, I believe I have gotten all covered… and made it shorter
Three questions:
1. On your third regex: /(?<!’\\u)..... Is that supposed to be similar to the above two?
2. (?<=’) Is this a lookbehind?
3. I’m guessing this ( (?=’\s;)* ) is a lookahead, but what else is it doing?
Some of the more advanced features of regular expressions I still have to learn (like, anything beyond the basics).
I’ve looked around the web for information sources on C++ escape sequences, and it’s kinda inconsistent. Eg, some sites gave escaping hex chars as “\x*dd*”, while some sites say that when escaping hex chars, the hex sequence can be as long as you want, as in “\x*dddd…*”.
So, the question is whether we should target a specific standard or try to cover all possiblities?
Thanks idyllrain. :)
BrendonKoz:
1. (?<!’\\u) is a negative lookbehind. I’m basically asserting that the characters \u does not appear before matching 2 or more characters.
2. Yes, (?<=’) is a positive lookbehind, asserting that before matching, there is a single quote character to the left of the match.
3. You’re right, that’s a positive lookahead, asserting that after the match, there is a single quote character, followed by zero or more whitespace characters, and a semicolon.
Wonderful… Visual Studio’s syntax highlighting has the same effect (any junk character string is matched). :P
I didn’t find anything decent in my local MSDN documentation. I think the best source to structure a REGEX from would be your online-based Microsoft documentation link (in the case of C++). I am uncertain whether or not the base language of C will allow for all of those characters; I would think not. Then again, who’s to say that gcc supports all of those as well? Crap, who was the C/C++ programmer from the Intro thread? :P svenax?
tstrokes: If the book mention was towards me, I own the Perl bible book. There’s an entire (large) chapter devoted to regular expressions. I’ve read through it once, I should probably do it again. :)
My one and only C++ reference book from ages ago also does not mention support for all those characters. And I’m rather unclear on the most current language specifications for it. Anyhow, if C doesn’t support it while C++ does, we can just put in separate definitions in the respective grammar files.
Talking bout regex books, anyone here have Mastering Regular Expressions? Is it as good as everyone seems to be saying it is? I’m planning to buy it, but I might be tempted to skip that book for Edward Tufte’s Envisioning Information. Heh…
I’ve got both books and they’re both excellent. By the time you’ve finished reading Friedl’s Regex book, you’ll be dreaming in Regular Expressions – it’s pretty comprehensive and focused. However, to be fair, you can look most regex stuff up online when you need something, so it’s maybe not a book you need.
If you’re only going to get one, I think I would get the Tufte book – his books are stellar and more broadly applicable to lots of information architecture topics, rather than being focussed on one thing.
Kinda broke down on Valentine’s Day and went to purchase the only copy of Envisioning Information in my whole country… superb book!
Heh – good for you. At least that way you know that the gift will be appreciated!! :)
1 to 26 of 26