Search code examples
ruby-on-railsrubycsvrake-task

Ruby: CSV parser tripping over double quotation in my data


I 'm working on a daily scheduled rake task that will download a CSV that is automatically sent to Dropbox every day, parse it and save to the database. I don't have control over the way the data is entered into the program that generates the CSV reports for this, so I can't avoid there being double quotes used in some of the data. However, I am wondering if there is a way I can strip or replace them with single quotes within the rake task or somehow inform the parser so it doesn't throw this error.

Rake task code:

require 'net/http'
require 'csv'
require 'open-uri'

namespace :fp_import do
    desc "download abc_relations from dropbox, save as csv, create or update record in db"
    task :fp => :environment do
        data = URI.parse("<<file's dropbox link>>").read

       File.open(Rails.root.join('lib/assets', 'fp_relation.csv'), 'w') do |file|
         file.write(data)
       end

       file= Rails.root.join('lib/assets', 'fp_relation.csv')

        CSV.foreach(file) do |row|
            div, fg_style, fg_color, factory, part_style, part_color, comp_code, vendor, design_no, comp_type = row
            fg_sku = fg_style + "-" + fg_color
            part_sku = part_style + "-" + part_color

            relation = FgPart.where('part_sku LIKE ? AND fg_sku LIKE?', "%#{part_sku}%", "%#{fg_sku}%").exists?
            if relation == false

                FgPart.create(fg_style: fg_style, fg_color: fg_color, fg_sku: fg_sku, factory: factory, part_style: part_style, part_color: part_color, part_sku: part_sku, comp_code: comp_code, comp_type: comp_type, design_no: design_no)
            end
        end
    end
end

There are about 35,000 rows in this CSV. Below is a sample. You can see the double quotes in the 4th row of the sample.

Sample data:

"01","502210","018","ZH","5931","001","M","","UPHOLSTERED GLIDER A","RM"
"01","502310","053","ZH","25332","NO","O","","UPHOLSTERED GLIDER","BAG"
"01","502310","065","ZH","25332","NO","O","","UPHOLSTERED GLIDER","BAG"
"01","502312","424","ZH","25332","NO","O","","UPHOLSTERED GLIDER"AUS"","BAG"
"01","503210","277","ZH","25332","NO","O","","UPHOLSTERED GLIDER","BAG"
"01","503310","076","ZH","25332","NO","O","","UPHOLSTERED GLIDER","BAG"
"01","506210","018","ZH","25332","NO","O","","UPHOLSTERED GLIDER","BAG"
"01","506210","467","ZH","25332","NO","O","","UPHOLSTERED GLIDER","BAG"
"01","507610","932","AZ","25332","NO","O","","GLIDER","BAG"
"01","507610","932","AZ","5936","001","M","","GLIDER","RM"

Solution

  • The source CSV is malformed, quotes should be escaped before.

    I would edit the file before parsing it with CSV and remove quotes between commas, and replace double quotes with simple ones, you can create a new file in case you don't want to edit the original.

    def fix_csv(file)
      out = File.open("fixed_"+file, 'w')
      File.readlines(file).each do |line|
        line = line[1...-2] #remove beggining and end quotes
        line.gsub!(/","/,",") #remove all quotes between commas
        line.gsub!(/"/,"'") #replace double quotes to single
        out << line +"\n" #add the line plus endline to output
      end
    
      out.close
      return "fixed_"+file
    end
    

    In case you want to modify the same CSV file, you can do it this way:

    require 'tempfile'
    require 'fileutils'
    
    def modify_csv(file)
      temp_file = Tempfile.new('temp')
      begin
        File.readlines(file).each do |line|
          line = line[1...-2]
          line.gsub!(/","/,",")
          line.gsub!(/"/,"'")
          temp_file << line +"\n"
        end
        temp_file.close
        FileUtils.mv(temp_file.path, file)
      ensure
        temp_file.close
        temp_file.unlink
      end
    end
    

    This is explained here in case you want to take a look, this will fix or sanitize your original CSV file