I exported tables and queries from SQL, where some of the fields are multi-line.
The Ruby (1.9+) way to read CSV appears to be:
require 'csv'
CSV.foreach("exported_mysql_table.csv", {:headers=>true}) do |row|
puts row
end
Which works great if my data is like this:
"id","name","email","potato"
1,"Bob","[email protected]","omnomnom"
2,"Charlie","[email protected]","andcheese"
4,"Doug","[email protected]","usemeltattack"
(The first line is the header/attributes)
But if I have:
"id","name","address","email","potato"
1,"Bob","---
- 101 Cottage row
- Lovely Village
- \"\"
","[email protected]","omnomnom"
2,"Charlie","---
- 102 Flame Street
- \"\"
- \"\"
","[email protected]","andcheese"
4,"Doug","---
- 103 Dark Cave
- Next to some geo dude
- So many bats
","[email protected]","usemeltattack"
Then I get the error:
.rbenv/versions/1.9.3-p194/lib/ruby/1.9.1/csv.rb:1894:in `block (2 levels) in shift': Missing or stray quote in line 2 (CSV::MalformedCSVError)
This seems to be because the end of the line doesn't have a close quote, as it spans several lines.
(I tried 'FasterCSV', that gem became 'csv' since ruby 1.9)
Your problem is not the multiline but malformed CSV.
Replace the \"
and end space after a line end like this:
require 'csv'
ml = %q{"id","name","address","email","potato"
1,"Bob","---
- 101 Cottage row
- Lovely Village
- \"\"
","[email protected]","omnomnom"
2,"Charlie","---
- 102 Flame Street
- \"\"
- \"\"
","[email protected]","andcheese"
4,"Doug","---
- 103 Dark Cave
- Next to some geo dude
- So many bats
","[email protected]","usemeltattack"}
ml.gsub!(/\" \n/,"\"\n").gsub!(/\\\"/,"__")
CSV.parse(ml, {:headers=>true}) do |row|
puts row
end
This gives:
"id","name","address","email","potato"
1,"Bob","---
- 101 Cottage row
- Lovely Village
- ____
","[email protected]","omnomnom"
etc
If you have no control over the program that delivers the CSV, you have to open the file, read the contents, do a replace and then parse the CSV. I use __
here but you can use other non-conflicting characters.