Search code examples
asciiantlrworksasciiencoding

ANTLRWorks 1.4.3 can't display & mutilates ASCII characters


As a new development to my previous question (ANTLRWorks 1.4.3 can't properly read extended-ASCII characters), I created a simple text file using a hex editor:

' ' '£' '°' 'ç'

Or in hex:

27 A0 27 20 27 A3 27 20 27 B0 27 20 27 E7 27

The resulting file reads fine in Notepad++. Upon opening in ANTLRWorks 1.4.3 the (extended) ASCII characters are displayed as square boxes. Upon saving the file after adding and removing a space at the end of the line, the hexadecimal file view looks as follows:

27 3F 20 27 A3 27 20 27 B0 27 20 27 3F

For some reason the initial space (20) in between apostrophes got mutilated into a question mark (3F) and the special c with cedilla character (E7) and the apostrophe following it got both replaced by a question mark.

It seems that the presence of extended ASCII characters somehow results in things going horribly wrong. Can anyone here replicate this issue and/or offer a possible reason and solution?

Thanks in advance.


Solution

  • You could just use the Unicode escapes instead. Say you want to match the English pound sign, you'd do:

    PoundSign : '\u00A3';
    

    instead of:

    PoundSign : '£';
    

    They (should) both match the same character, and the first may very well not be mangled.