I need to read an external file in ruby.
Running file -i
locally shows
text/plain; charset=utf-16le
I open it in ruby CSV with separater '\t' and a row shows as:
<CSV::Row "\xFF\xFEC\x00a\x00n\x00d\x00i\x00d\x00a\x00t\x00e\x00 \x00n\x00u\
...
row.to_s produces \x000\x000\x000\x001\x00\t\x00E\x00D\x00O
Running puts row
shows the data correctly:
0001 EDOARDO A
...
(the values also show legibly in vim and LibreOffice Calc)
Any suggestions how to get the data in ruby? I've tried various combinations of opening the CSV with external_encoding: 'utf-16le', internal_encoding: "utf-8"
etc., but puts
is the only thing that gives legible values
It also said ASCII-8BIT in ruby CSV.
<#CSV io_type:StringIO encoding:ASCII-8BIT lineno:0 col_sep:"\\t" row_sep:"\n" quote_char:"\"" headers:true>
The file itself was produced as an XLS file. I have uploaded an edited version here (edited i gvim)
The issue was that I was reading from a Paperclip attachment, which needed to have the encoding set (overridden) before saving.
Adding s3_headers in the model worked:
has_attached_file :attachment, s3_headers: lambda { |attachment|
{
'content-Type' => 'text/csv; charset=utf-16le'
}
}
Thanks to Julien for tipping me off that the issue was related to the paperclip attachment (that solution works to read the file directly)