Search code examples
arraysrubyhashnested-loops

Ruby Hash Key with Multiple Values: returning the minimum value in a timely manner


UPDATE: I was initially overwriting the hash keys, but have since resolved that. Thank you for everyone's input so far.

The issue now is how long the iterations are taking hours to produce data:

customers csv has 22,000 rows.

fiber csv has 170,000 rows.

fiber = CSV.read("fiber.csv", {headers: true})
customers = CSV.read("customers.csv", {headers: true})

hh = Hash.new { |hsh,key| hsh[key] = [] }

#for each customer, loop through all the fiber coords
customers.each do |c|
  fiber.each do |f|
    hh[customer["cid"]].push Haversine.distance(c["lat"], c["lng"], f["lat"], f["lng"])
  end
end

vals = hh.map { |k, v| v.min } #returns the minimum value per row (which I want)

Since I'd like to use these values outside of the program/command line, I thought writing to a CSV would be an okay approach (other suggestions welcome).

However, since the above nested loop takes hours on hours to run without ever finishing, this is not an ideal approach.

CSV.open("hash_output.csv", "wb") {|csv| vals.each {|elem| csv << [elem]} }

Any ideas on how to speed this process up?


Solution

  • I think the problem is that you are overriding your name space with each loop. I would do something like this:

    hh = Hash.new { |hsh,key| hsh[key] = [] }
    #for each customer, loop through all the fiber coords
    customers.each do |c|      
      fiber.each do |f|
        hh[c["last Name"]].push Haversine.distance(c["lat"], c["lng"], f["lat"], f["lng"])
      end
    end
    

    That way the keys will be the customer's last name and the values will be an array of distances. So the resulting data structure will look like this:

    { 
       "DOE" => [922224.16, 920129.46, 919214.42],
       ...
    }