I'm parsing a CSV file that I've pulled from an FTP site. I want to parse the CSV and extract some specific fields to store in the database. I encounter some encoding I don't understand and I believe CSV.parse also isn't expecting the encoding:
filename = "#{RAILS_ROOT}/spec/files/20120801.01.001.CSV"
filestream = File.new(filename, "r")
while (line = filestream.gets)
puts "line: #{line}"
CSV.parse(line) do |row|
case row[0]
when "RH"
# do something
when "SH"
#do something else
end
end
end
The first line in the CSV file looks something like this:
"\376\377\000\"\000R\000H\000\"\000,\0002\0000\0004\0005\000/\0000\0008\000/\0000\0002\000 \0000\0005\000:\0005\0007\000:\0002\0001\000 \000-\0000\0007\0000\0000\000,\0002\0000\0001\0002\000/\0000\0008\000/\0000\0001\000 \0000\0000\000:\0000\0000\000:\0000\0000\000 \000-\0000\0004\0000\0000\000,\0002\0000\0001\0002\000/\0000\0008\000/\0000\0001\000 \0002\0003\000:\0005\0009\000:\0001\0004\000 \000-\0000\0007\0000\0000\000,\000\"\000Y\0003\000B\0003\0003\000Z\000N\000K\000A\000U\000B\000H\000N\000\"\000,\0000\0000\0001\000,\000\n"
I have a different CSV file that I created myself and it prints out as human-readable text. What am I missing here? Do I need to apply a some encoding to the CSV string before passing to CSV.parse.
Here's the stacktrace:
CSV::IllegalFormatError
/Users/project/app/models/parse_csv.rb:5:in `parse'
I am forced to use ruby v1.8.7 at the moment.
I know that I could use CSV.open, but I'm intentionally trying to feed CSV.parse an IO stream so that I can grab CSV files from an FTP site using SFTP to stream the files into memory without having to store the CSV file to disk:
sftp.open_handle("/path/to/remote.file") do |handle|
data = sftp.read(handle)
end
Thanks in advance for any ideas!
The line has double quotes in it which may need to be escaped. I found this on ruby-forum.com.
It's just a guess, but maybe you could try replacing every double-quote character that isn't either preceded or followed by a comma with a single quote? Something like the untested code below:
line.gsub(/[^,]"[^,]/,"'")
It would probably require reading the whole file first, writing out a corrected version, and then calling the CSV methods on that, but it beats doing it by hand :).
Also, as an aside, I think instead of
while (line = filestream.gets)
you could do
filestream.gets.each_line do |line|
which might be more rubyish (maybe?)