Search code examples
ruby-on-railsrubyfastercsv

encoding issue with CSV.parse


My goal is to upload a file containing rows of firstname and lastname, parse it and create Person model in db for each row.

I do the following and it works fine

file = CSV.parse(the_file_to_parse)
file.each do |row|
  person = Person.new(:firstname => row[0], :lastname => row[1])
  person.save
end

until my file contains accents (french words), I get

Encoding::UndefinedConversionError: "\xC3" from ASCII-8BIT to UTF-8:
INSERT INTO "people" ("created_at", "firstname", "lastname",
"updated_at") VALUES (?, ?, ?, ?)

What is the best way to deal with this encoding issue?


Solution

  • You need to open the csv file with the right encoding. For example:

    require 'csv'
    require 'pp'
    
    encoding = "ISO-8859-1"
    
    csv = CSV.open "names.csv", "rb:#{encoding}"
    csv.each do |line|
        puts "#{line[0]} #{line[1]}"
    end
    

    Here's a hint: it's probably not UTF-8.

    The list of encodings that your ruby supports can be viewed with this command in irb:

    puts Encoding.list.map(&:to_s).sort