I'm learning how to use Celluloid
. I’ve read all the documentation and think I have the idea of how to use it but lack practise. I'm about to test it with a CSV file with almost 12,000 rows.
I’m unsure how many actors I should assign to a job. I'm guessing this number should be dynamic. According to this railscasts episode the default number is set to the number of cores in your machine, but surely you should change this number based on your workload?
I have 12,000 records to get through, if I execute the code below I'm guessing it will initiate all the actors in my pool and queue them up to handle the jobs. But how should I gauge how many actors to dynamically assign to the work?
There are still many holes in my understanding, so feel free to challenge my whole implementation.
class Model < ActiveRecord::Base
include Celluloid
def initialize(row)
self.name = row[0]
self.alt_id = row[1]
self.definition = row[2]
self.save
self.terminate
end
end
CSV.open("./files/my_file.csv", "wb") do |csv|
Model.supervise(csv)
end
First, in your case you should create a different class for your actor.
class Model < ActiveRecord::Base
def self.save_from_csv(row)
new.tap do |m|
m.name = row[0]
m.alt_id = row[1]
m.definition = row[2]
m.save
end
end
end
class CSVWorker
include Celluloid
def persist_from_csv(row)
Model.persist_from_csv(row)
end
end
Then you can create a pool and do the work for each row.
pool = CSVWorker.pool(size: 4)
CSV.foreach("./files/my_file.csv") do |row|
pool.async.persist_from_csv(row)
end
Notice the async
. That's what makes it run in pseudo parallel.
I admit I haven't tested this, but even if it Works™, you should benchmark it to see if there's actually any gain from paralysation. I doubt that it will be much faster in MRI because the only IO involved is DB queries.