I'm using Searchkick 3.1.0
I have to bulk index a certain collection of records. By what I read in the docs and have tried, I cannot pass an predefined array of ids to Searchkick's reindex method. I'm using the async mode.
If you do for example, Klass.reindex(async: true), it will enqueue jobs with the specified batch_size in your options. The problem with that it loops through the entire model's ids will then determin if they have to be indexed. For example, if I have 10 000 records in my database and a batch size of 200, it will enqueue 50 jobs. It will then loop on each id and if the search_import's conditions are met, it will index it.
This step is useless, I would like to enqueue a pre-filtered array of ids to prevent looping through the entire records.
I tried writing the following job to overwrite the normal behavior :
def perform(class_name, batch_size = 100, offset = 0)
model = class_name.constantize
ids = model
.joins(:user)
.where(user: { active: true, id: $rollout.get(:searchkick).users })
.where("#{class_name.downcase.pluralize}.id > ?", offset)
.pluck(:id)
until ids.empty?
ids_to_enqueue = ids.shift(batch_size)
Searchkick::BulkReindexJob.perform_later(
class_name: model.name,
record_ids: ids_to_enqueue
)
end
The problem : The searchkick mapping options are completely ignored when inserting records into ElasticSearch and I can't figure out why. It doesn't take the specified match (text_middle) and create a mapping with default match 'keyword'.
Is there any clean way to bulk reindex an array of records without having to enqueue jobs containing unwanted records?
You should be able to reindex records based on a condition:
From the searchkick docs:
Reindex multiple records
Product.where(store_id: 1).reindex
You can put that in your own delayed job.
What I have done is have for some of our batch operations that happens already in a delayed job, I wrap the code in the job in the bulk block, also in the searchkick doc.
Searchkick.callbacks(:bulk) do
... // wrap some batch operations on model instrumented with searchkick.
// the bulk block should be outside of any transaction block
end