Search code examples
ruby-on-railsrubyarraysruby-on-rails-2

Ruby on Rails - Most Efficient Way to Compare Two Large Arrays for non-matching results


I have some legacy code in my app that compares all table rows for a model with the result of an API call from a supplier.

This worked fine until last week where both the number of table rows and the number of results by the supplier increased massively.

The table cardinality has went from 1,657 to 59,699 and the number of results returned by the API is ~ 150,000.

What the code is doing is looking in the API results to check that if the current table row data is not found, if so then the current data is orphaned in the database since it exists there but not in what the supplier has given us.

Looking through 150,000 results to check if something isn't there doesn't sound particularly clever to me and that looks to be the case as I don't even know how long this takes to run as the view is still loading after about half an hour :/

Controller

@telco_numbers = TelcoNumber.orphaned_in_db

Model

def self.orphaned_in_db
  db_numbers = self.find(:all)
  listed_numbers = self.all_telco
  orphaned_numbers = []
  db_numbers.each do |db|
    scan = listed_numbers.select{ |l| l.number == db.number}
    orphaned_numbers.push(db) if scan.empty?
  end
  return orphaned_numbers
end

def self.some_telco(per_page, page = 1)
  page = 1 if page.nil?
  # this is the first api call which returns a link which is then used for the next api call
  api_call = TelcoApiv3.new("post", "/numbers/#{TelcoApiv3.account_id}/allocated/all")
  listed_numbers = TelcoApiv3.poll(api_call.response["link"])
  return listed_numbers.collect do |ln| 
    ln.store("countrycode", ln["country_code"])
    TelcoNumber.new ln
  end
end

def self.all_telco(page = 1)
  listed_numbers = some_telco(@@max_nlist_results, page)
  if listed_numbers.length == @@max_nlist_results
    return listed_numbers.concat(all_telco(page + 1))
  else
    return listed_numbers
  end
end

Example API result format:

[{"country_code":"44","number":"1133508889"},....

The number relates to the number column in the table for the model. (It is stored as a varchar and not as a number).

Also, the api results are returned in ascending number order so are already sorted so I would have thought that would have made things better than they are?


Solution

  • Why are you not trying Array difference. First make two arrays of db_numbers & listed_numbers and subtract the smaller array from the bigger one like this:

    def self.orphaned_in_db
      db_numbers = self.find(:all).map{|x| x.number}
      listed_numbers = self.all_telco.map{|x| x.number}
      orphaned_numbers = db_numbers - listed_numbers
      orphaned_results = self.find(orphaned_numbers)
      return orphaned_results
    end
    

    When I will subtract the listed_numbers from db_numbers, I will get the non-matching results set. and now you can find the results on the basis of orphaned_numbers in your database. It will be much faster. Thanks