Search code examples
javascriptencodingutf-8iconv

Change string encoded in win1250 to utf8


I'm loading a file that has encoding win1250, but when I load it, it has characters like p��jemce instead of příjemce (note diacritics.)

I'd like to change the encoding FROM win1250 TO UTF8.

I managed to do it in PHP with the following code

$content = iconv('windows-1250', 'UTF-8', $content);

but I am unable to do it in Javascript. I need to do this encoding on client without sending it to server (so I can't use PHP as "encoding proxy")

I've tried to use libraries iconv-lite and text-encoding (on NPM) like this

    var reader = new FileReader();

    reader.onload = () => {
      var data = reader.result;
      // iconv-lite
      var buf = iconv.encode(data, 'win1250');
      var str1 = iconv.decode(new Buffer(buf), 'utf8');

      // text-encoding
      var uint8array = new TextEncoder('windows-1250').encode(data);
      var str2 = new TextDecoder('utf-8').decode(uint8array);

      console.log(str1);
      console.log(str2);
    };

    reader.readAsText(file);

But neither has actually correctly changed the encoding. Is there anything I'm missing?


Solution

  • I think you could simply try reader.readAsArrayBuffer

    var reader = new FileReader();
    reader.onload = () => {
      var buf = reader.result;
      // iconv-lite
      var str1 = iconv.decode(buf, 'win1250');
    
      // text-encoding
      var str2 = new TextDecoder('windows-1250').decode(buf);
    
      console.log(str1);
      console.log(str2);
    };
    
    reader.readAsArrayBuffer(file);
    

    If readAsArrayBuffer should get the binary data directly.

    I don't have the entire dev environment so the above code is not fully tested, hope it could at least be inspirational.