I have some legacy code in my app that compares all table rows for a model with the result of an API call from a supplier.
This worked fine until last week where both the number of table rows and the number of results by the supplier increased massively.
The table cardinality has went from 1,657 to 59,699 and the number of results returned by the API is ~ 150,000.
What the code is doing is looking in the API results to check that if the current table row data is not found, if so then the current data is orphaned in the database since it exists there but not in what the supplier has given us.
Looking through 150,000 results to check if something isn't there doesn't sound particularly clever to me and that looks to be the case as I don't even know how long this takes to run as the view is still loading after about half an hour :/
Controller
@telco_numbers = TelcoNumber.orphaned_in_db
Model
def self.orphaned_in_db
db_numbers = self.find(:all)
listed_numbers = self.all_telco
orphaned_numbers = []
db_numbers.each do |db|
scan = listed_numbers.select{ |l| l.number == db.number}
orphaned_numbers.push(db) if scan.empty?
end
return orphaned_numbers
end
def self.some_telco(per_page, page = 1)
page = 1 if page.nil?
# this is the first api call which returns a link which is then used for the next api call
api_call = TelcoApiv3.new("post", "/numbers/#{TelcoApiv3.account_id}/allocated/all")
listed_numbers = TelcoApiv3.poll(api_call.response["link"])
return listed_numbers.collect do |ln|
ln.store("countrycode", ln["country_code"])
TelcoNumber.new ln
end
end
def self.all_telco(page = 1)
listed_numbers = some_telco(@@max_nlist_results, page)
if listed_numbers.length == @@max_nlist_results
return listed_numbers.concat(all_telco(page + 1))
else
return listed_numbers
end
end
Example API result format:
[{"country_code":"44","number":"1133508889"},....
The number
relates to the number
column in the table for the model. (It is stored as a varchar and not as a number).
Also, the api results are returned in ascending number order so are already sorted so I would have thought that would have made things better than they are?
Why are you not trying Array difference. First make two arrays of db_numbers & listed_numbers and subtract the smaller array from the bigger one like this:
def self.orphaned_in_db
db_numbers = self.find(:all).map{|x| x.number}
listed_numbers = self.all_telco.map{|x| x.number}
orphaned_numbers = db_numbers - listed_numbers
orphaned_results = self.find(orphaned_numbers)
return orphaned_results
end
When I will subtract the listed_numbers from db_numbers, I will get the non-matching results set. and now you can find the results on the basis of orphaned_numbers in your database. It will be much faster. Thanks