Search code examples
mysqlrubycsvfastercsv

Ruby CSV read multiline fields


I exported tables and queries from SQL, where some of the fields are multi-line.

The Ruby (1.9+) way to read CSV appears to be:

require 'csv'

CSV.foreach("exported_mysql_table.csv", {:headers=>true}) do |row|
    puts row
end

Which works great if my data is like this:

"id","name","email","potato"
1,"Bob","[email protected]","omnomnom"
2,"Charlie","[email protected]","andcheese"
4,"Doug","[email protected]","usemeltattack"

(The first line is the header/attributes)

But if I have:

"id","name","address","email","potato"
1,"Bob","--- 
- 101 Cottage row
- Lovely Village
- \"\"
","[email protected]","omnomnom"
2,"Charlie","--- 
- 102 Flame Street
- \"\"
- \"\"
","[email protected]","andcheese"
4,"Doug","--- 
- 103 Dark Cave
- Next to some geo dude
- So many bats
","[email protected]","usemeltattack"

Then I get the error:

.rbenv/versions/1.9.3-p194/lib/ruby/1.9.1/csv.rb:1894:in `block (2 levels) in shift': Missing or stray quote in line 2 (CSV::MalformedCSVError)

This seems to be because the end of the line doesn't have a close quote, as it spans several lines.

(I tried 'FasterCSV', that gem became 'csv' since ruby 1.9)


Solution

  • Your problem is not the multiline but malformed CSV.

    Replace the \" and end space after a line end like this:

    require 'csv' 
    
    ml = %q{"id","name","address","email","potato" 
    1,"Bob","---  
    - 101 Cottage row 
    - Lovely Village 
    - \"\" 
    ","[email protected]","omnomnom" 
    2,"Charlie","---  
    - 102 Flame Street 
    - \"\" 
    - \"\" 
    ","[email protected]","andcheese" 
    4,"Doug","---  
    - 103 Dark Cave 
    - Next to some geo dude 
    - So many bats 
    ","[email protected]","usemeltattack"}
    
    ml.gsub!(/\" \n/,"\"\n").gsub!(/\\\"/,"__")
    
    CSV.parse(ml, {:headers=>true}) do |row|
      puts row
    end
    

    This gives:

    "id","name","address","email","potato"
    1,"Bob","---  
    - 101 Cottage row 
    - Lovely Village 
    - ____
    ","[email protected]","omnomnom"
    etc
    

    If you have no control over the program that delivers the CSV, you have to open the file, read the contents, do a replace and then parse the CSV. I use __ here but you can use other non-conflicting characters.