Search code examples
ruby-on-railsrubycsvdouble-quotes

Processing A CSV With Every Line Wrapped In Double Quotes


Oh boy, my first Stack Overflow question! One of our clients is sending us a CSV file to process, but the way they're sending it, every single line is wrapped in double quotes:

"example, header, values\r\n"
"example, first, line\r\n"
"example, second, line\r\n"
...
"etc, etc, etc\r\n"

This in turn is causing Ruby to parse every line as a single field, including the headers, which is causing this data ingestion script to crash.

The code currently opens this as a File object, which then gets passed to a CSV.foreach enumerator with some configurable options:

CSV.foreach(<File Object>, <Options Hash>).with_index(1) do |line, index|
# process a record
end

Is there a straightforward way to tell Ruby to just ignore these quotes so that it can correctly parse individual fields?

I've tried changing the quote_char in the CSV options to a single quote, but somehow that actually makes things worse. I could probably do all sorts of work to remove these quotes from the file before processing it, but that would require making a bunch of changes to legacy code, and I'd like to avoid it if I can. I've gone through some documentation about CSV options, but I'm not seeing any obvious silver bullet.

For reference, the CSV options are configured as such:

{
 headers: true,
 skip_blanks: true,
 encoding: 'bom|utf-8',
 liberal_parsing: true,
 header_converters: lambda { |f| f.downcase.strip },
 row_sep: "\r\n",
 quote_char: "'"
}

Solution

  • You will have to do a little "pre-processing" on the file before parsing the csv. Like this:

    #test.csv
    "status,color,name\r\n"
    "active,green,Norm\r\n"
    "inactive,red,Herb"
    
    # test.rb
    require 'csv'
    
    not_csv = File.readlines('test.csv')
    real_csv = ""
    
    not_csv.each{|line| real_csv += line.sub("\\r\\n","").gsub('"','') }
    
    parsed_csv = CSV.parse(real_csv, headers: true)
    puts parsed_csv[0]["status"] #=>active
    puts parsed_csv[1]["name"]  #=>Herb
    

    from the console run ruby test.rb