Search code examples
ruby-on-railsrubyparsingcsvfastercsv

MalformedCSVError with rails CSV (FasterCSV)


I'm having serious issues trying to parse some CSV in rails right now. Basically my app gets a user to upload a CSV file. The app then converts the file to ensure it is in UTF-8 format, then attempts to parse it and process it. Whenever the app attempts to parse it however, I get the MalformedCSVError stating "Illegal quoting on line 1"

Now what I don't get, is if I copy the original file into a new document and save it, then I can parse it on a rails console without a problem.

If I attempt to parse the original file, it complains about an invalid character for UTF-8 encoding (the file isn't in UTF-8 hence the app converts it)

If I attempt to parse the file which the app has converted to UTF-8 and changed the line endings to LF, it fails to parse.

If I do a file diff between the version the app has produced, and the copy/paste version that I have made (which works) there are 0 differences so I really can't figure out why one is parsable, and one is not.

Any suggestions? My app is processing the file as follows :

def create
@survey = Survey.new(params[:survey])

# Now we need to try and convert this to UTF-8 if it isn't already
 encoded = File.read(@survey.survey_data.current_path)
encoding = CharlockHolmes::EncodingDetector.detect(encoded)

# We've got a guess at the encoding, 
# so we can try and convert it but it 
# may still fail so we need to handle 
# that
begin
  re_encoded = CharlockHolmes::Converter.convert(encoded, encoding[:encoding], 'UTF-8')
  re_encoded = re_encoded.gsub(/\r\n?/, "\n")

  # Now replace the uploaded file
  File.open(@survey.survey_data.current_path, 'w') { |f|
    f.write(re_encoded)
  }
rescue ArgumentError
  puts "UH OH!!!!!"
end

puts "#{@survey.survey_data.current_path}"
@parsed = CSV.read(@survey.survey_data.current_path)

end

The file uploading gem is CarrierWave if that makes any difference.

Please can someone help me as this is driving me insane!

Edit

The error says it's on line 1. Line 1 (assuming it doesn't index from 0) is

"Survey","RD","GarrysMDs","NigelsMDs","PaulsMDs","StephensMDs","BrinleyJ","CarolineP","DaveL","GrantR","GregS","Kent","NeilC","NicolaP","AndyC","DarrenS","DeanB","KarenF","PaulR","RichardF","SteveG","BrianG","GordonA","NickD","NickR","NickT","RayL","SimonH","EdmondH","JasonF","MikeS","SamanthaN","TimB","TravisF","AlanS","Q1","Q2","Q3","Q4","Q5","Q6","Q7","Q8PM","Q8N","Q9","Q10","Q11","Q12","Q13","Q14","Q15","Q16PM","Q16N","Q17PM","Q17N","Q18PM","Q18N","Q19","Q20","Q21","Q22","comment","Q23.1","Q23.2","Q23.3","TQ23.1","TQ23.2","VPM","VN","VQ1","VQ2","VQ3","VQ4","VQ5","VQ6","VQ7","VQ8N","VQ8PM","VQ9","VQ10","VQ11","VQ12","VQ13","VQ14","VQ15","VQ16","VQ16N","VQ16PM","VQ17","VQ17N","VQ17PM","VQ18","VQ18N","VQ18PM","VQ19","VQ20","VQ21","VQ22","VQ23.1","VQ23.2","VQ23.3","VRD","XQ16","XQ17","XQ18"

Solution

  • Well that was irritating!

    Turns out the file had a BOM which was causing the CSV parser to break. Loading the file with

    CSV.open("path/to/file.csv", "rb:bom|encoding")
    

    allowed it to parse it perfectly! So annoyed how long it took to track down but it's now working and with no need to convert to UTF-8 now either!