Search code examples
node.jsutf-8character-encodingiconv

Node.JS how to decode ISO-8859-1 into UTF-8?


I'm querying a database in ISO-8859-1 but since node runs in UTF8 mode, i must convert the data being returned this particular DBMS.

I tried iconv but I can't figure out how to get the desired output. For example, i got 0xc2 0x80 when I expected 0xe2 0x82 0xac to be returned.

var iconv = require('iconv-lite');

var buffer = Buffer.from([0x80]);
var str = iconv.decode(buffer, 'iso-8859-1');
console.log({str});
console.log(new Buffer(str, 'utf8'));
iconv.encode(new Buffer('€','utf8'),'iso-8859-1');

/*
Which outputs
{ str: '' }
<Buffer c2 80>*/
  • In UTF8 € is represented by 0xe2 0x82 0xac
  • In ISO-8859-1 is represented by 0x80

Updates:

  • Expected value for € is 0xe2 0x82 0xac and not 0xdb as I mentioned initially by mistake
  • As stated in the comments ISO-8859-1 doesn't contain a € character.

Solution

  • Thanks to the comments above I realize that despite of having a character set of "ISO8859_1" in my database, under the hood IBEXPERT is using and presenting me the data in WINDOWS-1252 (known as ANSI) encoding, which explains why I was seeing 0x80 in their HEX viewer.

    Maybe WINDOWS-1252 extends somehow the ISO8859_1 character set??

    For example: Running the code below works fine: € is correctly decoded.

    var str = iconv.decode(buffer, 'WINDOWS-1252');
    console.log({str});
    console.log(new Buffer(str, 'utf8'));
    var str2 = iconv.encode(new Buffer('€','utf8'),'WINDOWS-1252');
    console.log({strEncoded: str2})
    /*
    { str: '€' }
    <Buffer e2 82 ac>
    { strEncoded: <Buffer 80> }
    * */
    

    The weird part is that my database query which uses node-firebirdlib-fbclient to communicate with my firebird database resolves with a UTF8 character that cannot be represented in UTF8 as you can see by the symbol value which is ' ' which translate into 0xc2 0x80.

       { idNumber: 1,
         id: 'EUR',
         taxPercentage: 1,
         isDefault: -1,
         accountNumber: null,
         dontUse: false,
         symbol: '' },
      eur: <Buffer c2 80> }
    

    eur: is being output by console.log(new Buffer(result.symbol,'utf8'))

    And decoding this from utf8 to 'WINDOWS-1252' with the following command iconv.decode(Buffer.from(currency.symbol, 'utf8'), 'WINDOWS-1252') returns

    ... "defaultCurrency": { "id": "EUR", "symbol": "€", "label": "EUR" }...