I have two large-ish CSV files. One file is just a list of records. The other file is a list of records, but the first column is the line number of the record that it modifies in the other file. It doesn't replace the whole row; It just replaces the value in the row that has the matching header.
For example:
File 1:
"First","Last","Lang"
"John","Doe","Ruby"
"Jane","Doe","Perl"
"Dane","Joe","Lisp"
File 2:
"Seq","Lang"
2,"Ruby"
The goal is to end up with one file that looks like this:
"First","Last","Lang"
"John","Doe","Ruby"
"Jane","Doe","Ruby"
"Dane","Joe","Lisp"
The data is, however, much more complicated than that and could even contain line breaks in the CSV. Thus, I can't rely on the line number and instead I have to rely on the record count. (Unless, of course, I preprocess both files to replace newlines and carriage returns.. which I suppose is possible but less interesting.)
The question I have is how do I loop through both files and do the proper replacement without loading either of the entire files into memory. I believe 100mb+ files loaded into memory is a bad idea, right?
Also, the records in the resulting file should be in the same order when it's done.
You will need 2 enumerators, but since they're not nested, one will need to use Enumerator#next, which means you need to be careful about it raising a EOF exception:
e = CSV.open('file2.csv', :headers => true).each
seq = e.next
output = CSV.open('output.csv', 'w')
csv = CSV.open('file1.csv')
csv.each do |row|
if seq['Seq'].to_i == csv.lineno - 1
row[2] = seq['Lang']
seq = e.next rescue ({'Seq' => -1})
end
output << row
end