First of all This is not a duplicate of this SO question here
.I have a csv file encoded in Shift-JIS
this is my script to parse the file
require 'csv'
str1 = '社員番号'
str2 = 'メールアドレス'
str1.force_encoding("Shift_JIS").encode!
str2.force_encoding("Shift_JIS").encode!
file=File.open("SyainInfo.csv", "r:Shift_JIS")
csv = CSV.read(file, headers: true)
p csv[str1]
p csv [str2]
but even after specifying enconding, I am getting invalid byte sequence in UTF-8 (ArgumentError)
. Any thoughts? My ruby is 2.3.0
First of all, your encoding doesn't look right:
'社員番号'.force_encoding("Shift_JIS").encode!
#=> "\x{E7A4}\xBE\x{E593}\xA1\x{E795}\xAA\x{E58F}\xB7"
force_encoding
takes the bytes from str1
and interprets them as Shift JIS, whereas you probably want to convert the string to Shift JIS:
'社員番号'.encode('Shift_JIS')
#=> "\x{8ED0}\x{88F5}\x{94D4}\x{8D86}"
Next, you can pass a filename to CSV.read
, so instead of:
file = File.open(filename)
CSV.read(file)
You can just write:
CSV.read(filename)
That said, you could either work with Shift JIS encoded strings:
require 'csv'
str1 = '社員番号'.encode("Shift_JIS")
str2 = 'メールアドレス'.encode("Shift_JIS")
csv = CSV.read('SyainInfo.csv', encoding: 'Shift_JIS', headers: true)
csv[str1]
csv[str2]
Or – and that's what I would do – you could work with UTF-8 strings by specifying a second encoding:
require 'csv'
str1 = '社員番号'
str2 = 'メールアドレス'
csv = CSV.read('SyainInfo.csv', encoding: 'Shift_JIS:UTF-8', headers: true)
csv[str1]
csv[str2]
encoding: 'Shift_JIS:UTF-8'
instructs CSV
to read Shift JIS data and transcode it to UTF-8. It's equivalent to passing 'r:Shift_JIS:UTF-8'
to File.open