Search code examples
character-encodingasciinon-ascii-characterseuro

In which encoding is 0xDB a currency symbol?


I received files which, sadly, I cannot get info about how they were generated. I need to parse these files.

The file is entirely ASCII besides for one character: 0xDB (in decimal it gives 219).

Obviously (from looking at the file) this character is a currency symbol. I know it because:

  • it is mandatory for these files to contain a currency symbol anywhere an amount appears
  • there's no other currency symbol (neither $ nor euro nor nothing) nowhere in the files
  • everytime that 0xDB appears it's next to an amount

I think that in these files that 0xDB is supposed to represent the Euro symbol (it is actually very highly probable that this 0xDB appears everywhere a Euro symbol is supposed to appear).

The file command says this about the files:

ISO-8859 English text, with CRLF, LF line terminators

An hexdump gives this:

00000030  71 75 61 6e 74 20 db 32  2e 36 30 0a 20 41 49 4d  |quant .2.60. AIM|
                            ^^                                     ^

The files are all otherwise normally formatted/parsable. Actually I'm getting all the infos fine besides for that weird 0xDB character.

Does anyone know what's going on? How did a currency symbol (supposedly the euro symbol) somehow become a 0xDB?

It's neither ISO-8859-1 (aka ISO Latin 1) nor ISO-8859-15 because in both case code point 219 corresponds to 'Û' (just as Unicode codepoint 219 is 'LATIN CAPITAL LETTER U WITH CIRCUMFLEX').

It's not extended-ASCII.


Solution

  • It could be Mac OS Roman