Search code examples
ruby-on-railsrails-activejobsearchkick

Custom bulk indexer for searchkick : mapping options are ignored


I'm using Searchkick 3.1.0

I have to bulk index a certain collection of records. By what I read in the docs and have tried, I cannot pass an predefined array of ids to Searchkick's reindex method. I'm using the async mode.

If you do for example, Klass.reindex(async: true), it will enqueue jobs with the specified batch_size in your options. The problem with that it loops through the entire model's ids will then determin if they have to be indexed. For example, if I have 10 000 records in my database and a batch size of 200, it will enqueue 50 jobs. It will then loop on each id and if the search_import's conditions are met, it will index it.

This step is useless, I would like to enqueue a pre-filtered array of ids to prevent looping through the entire records.

I tried writing the following job to overwrite the normal behavior :

def perform(class_name, batch_size = 100, offset = 0)
    model = class_name.constantize
    ids = model
          .joins(:user)
          .where(user: { active: true, id: $rollout.get(:searchkick).users })
          .where("#{class_name.downcase.pluralize}.id > ?", offset)
          .pluck(:id)

    until ids.empty?
      ids_to_enqueue = ids.shift(batch_size)
      Searchkick::BulkReindexJob.perform_later(
          class_name: model.name,
          record_ids: ids_to_enqueue
      )
end

The problem : The searchkick mapping options are completely ignored when inserting records into ElasticSearch and I can't figure out why. It doesn't take the specified match (text_middle) and create a mapping with default match 'keyword'.

Is there any clean way to bulk reindex an array of records without having to enqueue jobs containing unwanted records?


Solution

  • You should be able to reindex records based on a condition:

    From the searchkick docs:

    Reindex multiple records
    
    Product.where(store_id: 1).reindex
    

    You can put that in your own delayed job.

    What I have done is have for some of our batch operations that happens already in a delayed job, I wrap the code in the job in the bulk block, also in the searchkick doc.

    Searchkick.callbacks(:bulk) do
    ... // wrap some batch operations on model instrumented with searchkick.
        // the bulk block should be outside of any transaction block
    end