Search code examples
elasticsearchlogstashelastic-stacklogstash-groklogstash-configuration

Logstash (Extractic parts of fields using regex)


I am using the Kafka plugin to input data into logstash from kafka.

input {
    kafka {
        bootstrap_servers => ["{{ kafka_bootstrap_server }}"]
        codec => "json"
        group_id => "{{ kafka_consumer_group_id }}"
        auto_offset_reset => "earliest"
        topics_pattern => ".*" <- This line ensures it reads from all kafka topics
        decorate_events => true
        add_field => { "[@metadata][label]" => "kafka-read" }

    }
}

The kafka topics are of the format ingest-abc & ingest-xyz

I use the following filter to specify the ES index where it should end up by setting the [@metadata][index_prefix] field.

filter {
    mutate {
        add_field => { 
                       "[@metadata][index_prefix]" => "%{[@metadata][kafka][topic]}"
                     }
        remove_field => ["[kafka][partition]", "[kafka][key]"]
    }
    if [message] {
        mutate {
          add_field => { "[pipeline_metadata][normalizer][original_raw_message]" => "%{message}" }
        }
    }
}

So my es indexes end up being
ingest-abc-YYYY-MM-DD
ingest-xyz-YYYY-MM-DD

How do I set the index_prefix to abc-YYYY-MM-DD & xyz-YYYY-MM-DD instead by getting rid of the commong ingest- prefix

The regex that matches it is: (?!ingest)\b(?!-)\S+ But I am not sure where it would fit in the config.

Thanks!


Solution

  • Ok so I figured it out if anyone ever stumbles on a similar problem, I basically used a gsub filter instead of filters and grok,

    This replaces any matching text with the passed text in argument3

    filter {
        mutate {
            rename => { "[@metadata][kafka]"  => "kafka"}
            gsub => [ "[@metadata][index_prefix]", "ingest-", "" ]
        }
    }