Search code examples
elasticsearchlogstashlogstash-configuration

logstash output to elasticsearch with document_id; what to do when I don't have a document_id?


I have some logstash input where I use the document_id to remove duplicates. However, most input doesn't have a document_id. The following plumbs the actual document_id through, but if it doesn't exist, it gets accepted as literally %{document_id}, which means most documents are seen as a duplicate of each other. Here's what my output block looks like:

output {
        elasticsearch_http {
            host => "127.0.0.1"
            document_id => "%{document_id}"
        }
}

I thought I might be able to use a conditional in the output. It fails, and the error is given below the code.

output {
        elasticsearch_http {
            host => "127.0.0.1"
            if document_id {
                document_id => "%{document_id}"
            } 
        }
}

Error: Expected one of #, => at line 101, column 8 (byte 3103) after output {
        elasticsearch_http {
    host => "127.0.0.1"
    if 

I tried a few "if" statements and they all fail, which is why I assume the problem is having a conditional of any sort in that block. Here are the alternatives I tried:

if document_id <> "" {
if [document_id] <> "" {
if [document_id] {
if "hello" <> "" {

Solution

  • You're close with the conditional idea but you can't place it inside a plugin block. Do this instead:

    output {
      if [document_id] {
        elasticsearch_http {
          host => "127.0.0.1"
          document_id => "%{document_id}"
        } 
      } else {
        elasticsearch_http {
          host => "127.0.0.1"
        } 
      }
    }
    

    (But the suggestion in one of the other answers to use the uuid filter is good too.)