I wrote a rake task in rails to update my user database with a gender
value. All it should do is loop through my users and update the gender attribute with a value that I get from a "gender detection" gem (which works well).
The rake task runs for minutes now although I only have a few dozens of records in my database:
require 'gender_detector'
namespace :user do
desc 'Assign gender to all users'
task :genderize => :environment do
User.all.each do |user|
gd = GenderDetector.new(:case_sensitive => false)
gender = gd.get_gender(user.firstname)
sql = "UPDATE users SET gender = '#{gender}' WHERE id = #{user.id}"
ActiveRecord::Base.connection.execute(sql)
end
end
end
what do I do wrong?
There are several components in that rake tasks: rails boot, database, genderdetector, etc. You should isolate and benchmark each component to understand what is the bottleneck.
Depending on how many gems you have, the Rails environment may take from a few seconds up to a minute to book. Therefore, the :environment
requirement may slow down your task.
I have no idea what the GenderDetector
does and how it works internally. If it queries a web service, for example, IO may slow down your task as well.
Finally, you can optimize your query as well to avoid loading unnecessary data from the database.
require 'gender_detector'
namespace :user do
desc 'Assign gender to all users'
task :genderize => :environment do
User.select('id, firstname').each do |user|
gd = GenderDetector.new(:case_sensitive => false)
gender = gd.get_gender(user.firstname)
User.update_all({ gender: gender }, user_id: user.id)
end
end
end