Search code examples
ruby-on-railscharacter-encodingutf-16le

Using Rails to Encode Text to UTF-16LE for Windows


I have a PLC application that requires the Unicode format of UTF-16LE to support accented characters. I build the text and convert it with:

str = Iconv.conv("utf-16le", "utf-8", str)

Then I output the file with the following:

send_data str, :filename => "sp_table.txt", :type => 'text/plain; charset=utf-16le; header=present;', :disposition => 'attachment'

The PLC application is not able to display the characters. If I read the file type properties with file -I, I get the following:

sp_table.txt: application/octet-stream; charset=binary

If I open the file in Notepad in Windows, it displays correctly. If I resave the file through Notepad, selecting Unicode as the encoding, file -I returns:

sp_table.txt: text/plain; charset=utf-16le

Also, after saving through Notepad, I am able to correctly display all characters in the PLC application.

Should I be specifying a different charset when I send the file?


Solution

  • Notepad adds a BOM, whereas UTF-16LE doesn't. I am bit iffy on the ruby syntax, but something like this:

    str = Iconv.conv("utf-16le", "utf-8", "\ufeff" + str)
    

    Or

    str = "\xFF\xFE" + Iconv.conv("utf-16le", "utf-8", str)
    

    Or

    str = "\377\376" + Iconv.conv("utf-16le", "utf-8", str)
    

    Basically the idea is to add the bytes 0xFF 0xFE (BOM for Little Endian UTF-16) at the beginning before sending them.