Search code examples
javascriptnode.jscharacter-encodingurlencodeiconv

How to encode string into windows 1256 using javascript


I need to encode Arabic string into windows 1256 format

So I have found a way to decode a string from windows 1256 to my original string I want the reverse/opposite of this code

function decode(string) {
  var array = [...string.matchAll(/%(.{2})/g)].map((groups) => parseInt(groups[1], 16));
  var decoder = new TextDecoder('windows-1256');
  return decoder.decode(Uint8Array.from(array).buffer);
}
console.log(decode('%E3%CD%E3%CF'));
console.log('%C7%E1%DA%E1%E6%E3+%2D%CA%DA%E1%ED%E3+%C7%D3%C7%D3%EC'.split('+').map(decode));


Solution

  • The iconv package on npm claims to do this. Something like this will probably work.

    const Iconv = require('iconv').Iconv;
    
    const utfToArabic = new Iconv('UTF-8', 'CP1256');
    const arabic = utfToArabic.convert(string);
    

    You may, depending on the content of your input string, do better specifying 'CP1256//TRANSLIT//IGNORE' instead of just 'CP1256'. That tells iconv to try to transliterate, then to ignore, characters in your input UTF-8 string that don't map to your code page.

    In Javascript, all string values, without exception, are coded in utf-8. Other encodings are handled as buffers. If it's not utf-8, it's not a string.

    Here's an example of round-trip conversion of a silly Arabic phrase to codepage 1256 and back, using iconv.

    const Iconv = require('iconv').Iconv;
    const eatGlass = 'أنا قادر على أكل الزجاج و هذا لا يؤلمني'
    console.log (eatGlass, eatGlass.length)
    const utfToArabic = new Iconv('UTF-8', 'CP1256')
    const arabicToUtf = new Iconv('CP1256', 'UTF-8')
    try {
      const arabic = utfToArabic.convert(eatGlass)
      console.log(arabic, arabic.length)
      const s = arabicToUtf.convert(arabic).toString()
      console.log (s, s.length)
    }
    catch (err) {
      console.log(err)
    }
    

    This snippet produces this output.

    أنا قادر على أكل الزجاج و هذا لا يؤلمني 39
    <Buffer c3 e4 c7 20 de c7 cf d1 20 da e1 ec 20 c3 df e1 20 c7 e1 d2 cc c7 cc 20 e6 20 e5 d0 c7 20 e1 c7 20 ed c4 e1 e3 e4 ed> 39
    أنا قادر على أكل الزجاج و هذا لا يؤلمني 39
    

    Your %C3%E4%C7+%DE%C7%CF%D1+%DA representation has uppercase hex numbers and leading %s, It is, of course, a flavor of URL Encoding specific to your application. You can convert the buffer you get from Iconv.convert() to a string like that with a function like this.

    function toHexStringWithMarker (buf, marker = '%' ) {
      const a = []
      buf.forEach(c => a.push(c === 0x20 ? '+' : marker + c.toString(16).toUpperCase()))
      return a.join('')
    }