I 'm working on a daily scheduled rake task that will download a CSV that is automatically sent to Dropbox every day, parse it and save to the database. I don't have control over the way the data is entered into the program that generates the CSV reports for this, so I can't avoid there being double quotes used in some of the data. However, I am wondering if there is a way I can strip or replace them with single quotes within the rake task or somehow inform the parser so it doesn't throw this error.
Rake task code:
require 'net/http'
require 'csv'
require 'open-uri'
namespace :fp_import do
desc "download abc_relations from dropbox, save as csv, create or update record in db"
task :fp => :environment do
data = URI.parse("<<file's dropbox link>>").read
File.open(Rails.root.join('lib/assets', 'fp_relation.csv'), 'w') do |file|
file.write(data)
end
file= Rails.root.join('lib/assets', 'fp_relation.csv')
CSV.foreach(file) do |row|
div, fg_style, fg_color, factory, part_style, part_color, comp_code, vendor, design_no, comp_type = row
fg_sku = fg_style + "-" + fg_color
part_sku = part_style + "-" + part_color
relation = FgPart.where('part_sku LIKE ? AND fg_sku LIKE?', "%#{part_sku}%", "%#{fg_sku}%").exists?
if relation == false
FgPart.create(fg_style: fg_style, fg_color: fg_color, fg_sku: fg_sku, factory: factory, part_style: part_style, part_color: part_color, part_sku: part_sku, comp_code: comp_code, comp_type: comp_type, design_no: design_no)
end
end
end
end
There are about 35,000 rows in this CSV. Below is a sample. You can see the double quotes in the 4th row of the sample.
Sample data:
"01","502210","018","ZH","5931","001","M","","UPHOLSTERED GLIDER A","RM"
"01","502310","053","ZH","25332","NO","O","","UPHOLSTERED GLIDER","BAG"
"01","502310","065","ZH","25332","NO","O","","UPHOLSTERED GLIDER","BAG"
"01","502312","424","ZH","25332","NO","O","","UPHOLSTERED GLIDER"AUS"","BAG"
"01","503210","277","ZH","25332","NO","O","","UPHOLSTERED GLIDER","BAG"
"01","503310","076","ZH","25332","NO","O","","UPHOLSTERED GLIDER","BAG"
"01","506210","018","ZH","25332","NO","O","","UPHOLSTERED GLIDER","BAG"
"01","506210","467","ZH","25332","NO","O","","UPHOLSTERED GLIDER","BAG"
"01","507610","932","AZ","25332","NO","O","","GLIDER","BAG"
"01","507610","932","AZ","5936","001","M","","GLIDER","RM"
The source CSV is malformed, quotes should be escaped before.
I would edit the file before parsing it with CSV and remove quotes between commas, and replace double quotes with simple ones, you can create a new file in case you don't want to edit the original.
def fix_csv(file)
out = File.open("fixed_"+file, 'w')
File.readlines(file).each do |line|
line = line[1...-2] #remove beggining and end quotes
line.gsub!(/","/,",") #remove all quotes between commas
line.gsub!(/"/,"'") #replace double quotes to single
out << line +"\n" #add the line plus endline to output
end
out.close
return "fixed_"+file
end
In case you want to modify the same CSV file, you can do it this way:
require 'tempfile'
require 'fileutils'
def modify_csv(file)
temp_file = Tempfile.new('temp')
begin
File.readlines(file).each do |line|
line = line[1...-2]
line.gsub!(/","/,",")
line.gsub!(/"/,"'")
temp_file << line +"\n"
end
temp_file.close
FileUtils.mv(temp_file.path, file)
ensure
temp_file.close
temp_file.unlink
end
end
This is explained here in case you want to take a look, this will fix or sanitize your original CSV file