Search code examples
rubycsvsmartercsv

Quoted fields escaped in smarter_csv


I have a CSV file that I'm trying to process with smarter_csv in Ruby. Every field has double quuotes and there are some fields that have double quotes nested within them that are not escaped. I'm using :force_simple_split => true as suggested in the documentation to fix this situation. However, when I try to process the file every field has escaped quotes within it. What am I doing wrong here?

I'm opening a csv file that was generated from a Windows server that looks something like this...

header1,header2,header3 "field1, There are "nested quotes" here.","field2", "field3"

I open the file with smarter_csv like so...

c = SmarterCSV.process('myfile.csv', force_simple_split: true, file_encoding: "iso-8859-1", row_sep: "\r")

Then I get output like this...

{:header1=>"\"field1, There are \"nested quotes\"", :header2=>"\"field2\"", :header3=>"\"field3\"" }


Solution

  • > SmarterCSV::VERSION
     => "1.0.19"
    > options = {:force_simple_split => true}
    > ap SmarterCSV.process('/tmp/quoted.csv', options)
    [
       {
            :header1 => "\"field1",
            :header2 => "There are \"nested quotes\" here.\"",
            :header3 => "\"field2\""
        }
    ]
    

    force_simple_split does just that .. it does not allow for embedded :col_sep characters in your data.

    Please note that your example CSV data contains an embedded comma, and that the simple split of the line with the data results in four fields, not three.

    Because you don't provide a fourth header, the fourth field is ignored.

    Please open an issue for correct handling of quoted data as in your example