I am writing a class that takes a CSV files, transforms it, and then writes the new data out.
module Transformer
class Base
def initialize(file)
@file = file
end
def original_data(&block)
opts = { headers: true }
CSV.open(file, 'rb', opts, &block)
end
def transformer
# complex manipulations here like modifying columns, picking only certain
# columns to put into new_data, etc but simplified to `+10` to keep
# example concise
-> { |row| new_data << row['some_header'] + 10 }
end
def transformed_data
self.original_data(self.transformer)
end
def write_new_data
CSV.open('new_file.csv', 'wb', opts) do |new_data|
transformed_data
end
end
end
end
What I'd like to be able to do is:
transformer
so that instances only need to implement their own transformers but the behavior for reading and writing is given by the base class.But obviously the above doesn't work because I don't really have a reference to new_data
in transformer
.
How could I achieve this elegantly?
I can recommend one of two approaches, depending on your needs and personal taste.
I have intentionally distilled the code to just its bare minimum (without your wrapping class), for clarity.
Since you do not want to slurp the file, use CSV::Foreach
. For example, for a quick debugging session, do:
CSV.foreach "source.csv", headers: true do |row|
row["name"] = row["name"].upcase
row["new column"] = "new value"
p row
end
And if you wish to write to file during that same iteration:
require 'csv'
csv_options = { headers: true }
# Open the target file for writing
CSV.open("target.csv", "wb") do |target|
# Add a header
target << %w[new header column names]
# Iterate over the source CSV rows
CSV.foreach "source.csv", **csv_options do |row|
# Mutate and add columns
row["name"] = row["name"].upcase
row["new column"] = "new value"
# Push the new row to the target file
target << row
end
end
CSV::Converters
There is a built in functionality that might be helpful - CSV::Converters
- (see the :converters
definition in the CSV::New
documentation)
require 'csv'
# Register a converter in the options hash
csv_options = { headers: true, converters: [:stripper] }
# Define a converter
CSV::Converters[:stripper] = lambda do |value, field|
value ? value.to_s.strip : value
end
CSV.open("target.csv", "wb") do |target|
# same as above
CSV.foreach "source.csv", **csv_options do |row|
# same as above - input data will already be converted
# you can do additional things here if needed
end
end
Based on your comment, and since you want to minimize I/O and iterations, perhaps extracting the read/write operations from the responsibility of the transformers might be of interest. Something like this.
require 'csv'
class NameCapitalizer
def self.call(row)
row["name"] = row["name"].upcase
end
end
class EmailRemover
def self.call(row)
row.delete 'email'
end
end
csv_options = { headers: true }
converters = [NameCapitalizer, EmailRemover]
CSV.open("target.csv", "wb") do |target|
CSV.foreach "source.csv", **csv_options do |row|
converters.each { |c| c.call row }
target << row
end
end
Note that the above code still does not handle the header, in case it was changed. You will probably have to reserve the last row (after all transformations) and prepend its #headers
to the output CSV.
There are probably plenty other ways to do it, but the CSV class in Ruby does not have the cleanest interface, so I try to keep code that deals with it as simple as I can.