Search code examples
elasticsearchlogstashpipelinelogstash-configuration

Logstash to add new non existing nested field in all query matching documents?


I am using ELK 7.12.
My external json :

{"req-id":"Test9","process-code":"demo9","field1":1,"field2":"abc"}

Elasticsearch document:

{"docid":"...", "h":{...},"a":{...}}

Intended output:

{"docid":"...", "h":{...},"a":{...}, "externaldata":{"field1":1,"field2":"abc"}}

Logstash pipeline :

filter {
    elasticsearch {
        hosts => "http://localhost:9200/"
        user => elastic
        password => elastic
        index => "demo7"
        query => "h.req-id:%{[req-id]} AND h.process-code:%{[process-code]}"
        docinfo_fields => {
          "_id" => "docid"
        }
    }
    if ("_elasticsearch_lookup_failure" not in [tags]) {
        mutate {
            add_field => {"externaldata"=>{}}
            add_field => { "externaldatafield1" => "%{[field1]}" }
            add_field => { "externaldatafield2" => "%{[field2]}" }
        }
        mutate {
            rename => {
                "externaldatafield1" => "[externaldata][field1]"
                "externaldatafield2" => "[externaldata][field2]"
            }
        }
    }
}
output {
    elasticsearch {
        hosts => "http://localhost:9200/"
        user => elastic
        password => elastic
        index => "demo7"
        action => "update"
        doc_as_upsert => true
        document_id => "%{docid}"
    }
}

Error :

"error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse field [externaldata] of type [text] in document with id '901'. Preview of field's value: '{field1=1, field2=abcd}'"

I have tried few combinations from other SO posts to add nested field in event but pipeline failed to execute. Please provide me right syntax. My study reference is this.

Edit 1:
As per comment from leandrojmp, result of GET /demo7/_mapping is :

"externaldata" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }

After studying mappings concept, I deleted index and re-executed pipeline successfully.

Now issue is, only one of the 2 query matching documents is updated. How to update all query matching documents with externaldata field? Please provide reference if I should use any loop/jump code ?

Edit 2:
My original question about additional field and mapping error is solved by leandrojmp; hence accepting their answer. But multi-doc update issue still exists. So far, I understood that instead of "elasticsearch", we should use "http" or "exec" plugin.


Solution

  • This error means that you already have a document in your index where the field externaldata has the type text and now you are trying to index the same field as an object.

    For example if in one document you have externaldata as a text:

    { 
        "externaldata": "some string text value" 
    }
    

    And in other document you have externaldata as and object:

    { 
        "externaldata": {
            "field1": "1",
            "field2": "2"
        }
    }
    

    One of these two documents will be rejected, which one will depends on your mapping, if you do not explicitly applied a mapping, elasticsearch will create a map for the field with the type it receives first, which in your case seems to be the text type.

    To solve this you need to delete your index and apply a mapping for the field externaldata or index a document where this field is an object.

    The mapping would be something like this:

    {
        "externaldata": {
            "properties: {
                "field1": { "type": "keyword" },
                "field2": { "type": "keyword" }
            }
        }   
    } 
    

    If in your data you have documents where this field is not an object, you will need to change its name, you can't have the same field as an string and as an object.

    Also, your mutate filter is wrong, you just need something like this:

    mutate {
        add_field => { "[externaldata][field1]" => "%{[field1]}" }
        add_field => { "[externaldata][field2]" => "%{[field2]}" }
    }