As a new development to my previous question (ANTLRWorks 1.4.3 can't properly read extended-ASCII characters), I created a simple text file using a hex editor:
' ' '£' '°' 'ç'
Or in hex:
27 A0 27 20 27 A3 27 20 27 B0 27 20 27 E7 27
The resulting file reads fine in Notepad++. Upon opening in ANTLRWorks 1.4.3 the (extended) ASCII characters are displayed as square boxes. Upon saving the file after adding and removing a space at the end of the line, the hexadecimal file view looks as follows:
27 3F 20 27 A3 27 20 27 B0 27 20 27 3F
For some reason the initial space (20) in between apostrophes got mutilated into a question mark (3F) and the special c with cedilla character (E7) and the apostrophe following it got both replaced by a question mark.
It seems that the presence of extended ASCII characters somehow results in things going horribly wrong. Can anyone here replicate this issue and/or offer a possible reason and solution?
Thanks in advance.
You could just use the Unicode escapes instead. Say you want to match the English pound sign, you'd do:
PoundSign : '\u00A3';
instead of:
PoundSign : '£';
They (should) both match the same character, and the first may very well not be mangled.