Search code examples
elasticsearchelasticsearch-dsl

Elasticsearch - What is faster? Index identical document or update with detect_noop: true?


I have a parent-child document mapping and parent only have one contact_id field. And I need to make sure that this parent document exists when I insert new child document. It may or may not be already existing.

So I use Bulk API to insert a parent if it not exist and insert a child in one request.

My Question is which method is faster: update with doc_as_upsert and detect_noop OR index new record with the same data that probably already exist:

{ update: { _index: 'index_name', _type: 'contact', _id: 25, _routing: 14}}
{ doc: { contact_id: 25 }, doc_as_upsert: true, detect_noop: true }
{ index: { _index: 'index_name', _type: 'event', _routing: 14, _parent: 25}}
{ ... event document body ...}

OR

{ index: { _index: 'index_name', _type: 'contact', _id: 25, _routing: 14}}
{ contact_id: 25 }
{ index: { _index: 'index_name', _type: 'event', _routing: 14, _parent: 25}}
{ ... event document body ...}

Solution

  • It seems like it's performs the same:

                       user     system      total        real
    update_10k_x1  6.460000   1.720000   8.180000 ( 79.737009)
    index_10k_x1   6.300000   1.680000   7.980000 ( 80.067855)
    update_10k_x2  12.660000   3.350000  16.010000 (159.787347)
    index_10k_x2   12.690000   3.380000  16.070000 (160.276717)
    update_10k_x3  18.870000   5.000000  23.870000 (242.023184)
    index_10k_x3   18.940000   5.030000  23.970000 (240.063431)
    

    Here is the benchmark code:

    require 'benchmark'
    require 'elasticsearch-ruby'
    
    $client = Elasticsearch::Client.new
    
    def update_10k(n)
      index_name = "#{__method__}_x#{n}"
      n.times do
        (1..10000).each do |id|
          body = []
          body << { update: {_index: index_name, _type: :contact, _id: id }}
          body << { doc: { contact_id: id }, doc_as_upsert: true, detect_noop: true }
          $client.bulk body: body
        end
      end
    end
    
    def index_10k(n)
      index_name = "#{__method__}_x#{n}"
      n.times do
        (1..10000).each do |id|
          body = []
          body << { index: {_index: index_name, _type: :contact, _id: id }}
          body << { contact_id: id }
          $client.bulk body: body
        end
      end
    end
    
    Benchmark.bm do |x|
      (1..3).each do |n|
        x.report("update_10k_x#{n}") { update_10k(n) }
        x.report("index_10k_x#{n}") { index_10k(n) }
      end
    end