Search code examples
javascriptnode.jscharacter-encodinghebrewfs

How can i open a Windows-1255 encoded file in Node.js?


I have a file in Windows-1255 (Hebrew) encoding, and i'd like to be able to access it in Node.js.

I tried opening the file with fs.readFile, and it gives me a Buffer that i can't do anything with. I tried setting the encoding to Windows-1255, but that wasn't recognized.

I also checked out the windows-1255 package, but i couldn't decode with that, because fs.readFile either gives a Buffer or a UTF8 string, and the package requires a 1255-encoded string.

How can i read a Windows-1255-encoded file in Node.js?


Solution

  • It seems that using the node-iconv package is the best way. Unfortunately iconv-lite which is easier to include in your code does not seem to implement transcoding for CP1255.

    This thread & answer shows simple example and concisely demonstrates using both these modules.

    Returning to iconv, I've had some problems installing on debian with npm prefix, and I submitted an issue to the maintainer here. I managed to workaround the issue sudo-ing the install, and the "sudo chown"-ing back to me the installed module.

    I have tested various win-xxxx encodings and CodePages that have access to (Western+Eastern European samples).

    But I could not make it work with CP1255 although it is listed in their supported encodings, because I do not have that specific codepage installed locally, and it gets all mangled up. I tried stealing some Hebrew script from this page, but the pasted version was always corrupted. I dared not actually install the language on my Windows machine for fear I don't brick it - sorry.

    // sample.js
    var Iconv = require('iconv').Iconv;
    var fs = require('fs');
    
    function decode(content) {
      var iconv = new Iconv('CP1255', 'UTF-8//TRANSLIT//IGNORE');
      var buffer = iconv.convert(content);
      return buffer.toString('utf8');
    };
    
    console.log(decode(fs.readFileSync('sample.txt')));
    

    Extra (off topic) explanations for dealing with file encodings, and how to read files through Node.js buffers:

    fs.readFile returns a buffer by default.

    // force the data to be string with the second optional argument
    fs.readFile(file, {encoding:'utf8'}, function(error, string) {
        console.log('raw string:', string);// autoconvert to a native string
    });
    

    OR

    // use the raw return buffer and do bitwise processing on the encoded bytestream
    fs.readFile(file, function(error, buffer) {
        console.log(buffer.toString('utf8'));// process the binary buffer
    });