Search code examples
node.jsencodingiconvjsdom

Nodejs jsdom encoding


I am trying to parse a site with the following tag in the header

<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"

following word

Aflenz - Bürgeralm

My Node js Code:

//includes

    var jsdom = require("jsdom");
    var fs = require('fs');
    var Buffer = require('buffer').Buffer;
    var Iconv  = require('iconv').Iconv;
    var iconv = new Iconv('iso-8859-1','utf-8');


 //parsing on callback from jsdom
        var name = $(".name_detail").html();
        console.log("db"+name);
        console.log("db"+iconv.convert(name).toString());

Output over ssh:

dbAflenz - B�rgeralm
dbAflenz - B�rgeralm

Thx in advance


Solution

  • You can try the following (JSDOM allows to make the request in binary format):

    request({uri: url, encoding: 'binary'}, function(err, request, body) {
         ...
         body = new Buffer(body, 'binary')
         iconv = new Iconv('ISO-8859-1', 'UTF8')
         body = iconv.convert(body).toString()
    

    This would help solve the case for conversion.