Search code examples
rubyfastercsv

Pulling a value from one CSV based on a value in another


I am trying to figure out the best way to pull a value from a CSV file called lookup.csv based on a value in master.csv, and then save the new file as output.csv.

In the example below, the master file has three columns of data with the last column being City. I'd like to replace the City name with the City Code from the lookup.csv file.

I don't have a DB that I can lookup from so I am having to use the CSV file. I am trying to use FasterCSV with Ruby 1.8.7.

Example File Structure:

master.csv:

First Name | Last Name | City
Joey       | Jello     | Atlanta
Home       | Boy       | Dallas

lookup.csv:

City    | City ID
Atlanta | 12345
Dallas  | 12346
Houston | 12347

output.csv:

First Name | Last Name | City
Joey       | Jello     | 12345
Home       | Boy       | 12346

Solution

  • I'm using 1.9, where FasterCSV is available as CSV in the standard lib. First I'd create a lookup hash out of lookup.csv:

    cities = Hash[CSV.read('lookup.csv', :col_sep => ' | ').to_a[1..-1]]
    

    If the file is very big, you might want to iterate over it with CSV.foreach and build the hash row by row:

    cities = {}
    CSV.foreach('lookup.csv', :col_sep => ' | ', :headers => true, :return_headers => false) do |line|
      cities[line['City']] = line['City ID']  
    end  
    

    Then iterate over master.csv, do a lookup of the city in the hash and write that to output.csv:

    CSV.open('output.csv', "w", :headers => ['First Name', 'Last Name', 'City ID'], :write_headers => true) do |output|
      CSV.foreach('master.csv', :col_sep => ' | ', :headers => true, :return_headers => false) do |line|
        output << [line['First Name'], line['Last Name'], cities[line['City']]]
      end  
    end