Search code examples
htmlnode.jsencodingdecodeiconv

Node.js Request - Can`t decode HTML page


I am trying to decode this HTML page using Node.js with Request module: http://www.receita.fazenda.gov.br/PessoaJuridica/CNPJ/cnpjreva/Cnpjreva_Erro.asp

javascript console returns the charset windows-1252:

document.characterSet = "windows-1252";

I tried using all avaliable encodings in iconv-lite but all return the wrong text.

var body = iconv.decode(new Buffer(body), "windows1252");

Anyone have any idea how to decode this page?

Sample code:

request('http://www.receita.fazenda.gov.br/PessoaJuridica/CNPJ/cnpjreva/Cnpjreva_Erro.asp', function (err, res, body) {
    var body = iconv.decode(new Buffer(body), "windows1252");    
    console.log(body);
});

Returns:

...
<td valign="middle" align="left"><b><font face="Arial" size="2">
        Acesso n�o permitido.
</td>
...

Decoded string should be:

...
<td valign="middle" align="left"><b><font face="Arial" size="2">
        Acesso não permitido.
</td>
...

Thanks.


Solution

  • The encoding the page returns using document.characterSet is wrong, the correct encoding is ISO-8859-1

    body = iconv.decode(body, "ISO-8859-1");