Search code examples
regexlogstashlogstash-grok

grok parsing issue


I have an input line that looks like this:

localhost_9999.kafka.server:type=SessionExpireListener,name=ZooKeeperSyncConnectsPerSec.OneMinuteRate

and I can use this pattern to parse it:

%{DATA:kafka_node}:type=%{DATA:kafka_metric_type},name=%{JAVACLASS:kafka_metric_name}

which gives me this:

{
  "kafka_node": [
    [
      "localhost_9999.kafka.server"
    ]
  ],
  "kafka_metric_type": [
    [
      "SessionExpireListener"
    ]
  ],
  "kafka_metric_name": [
    [
      "ZooKeeperSyncConnectsPerSec.OneMinuteRate"
    ]
  ]
}

I want to split the OneMinuteRate into a seperate field but can't seem to get it to work. I've tried this:

%{DATA:kafka_node}:type=%{DATA:kafka_metric_type},name=%{WORD:kafka_metric_name}.%{WORD:attr_type}"

but get nothing back then.

I'm also using https://grokdebug.herokuapp.com/ to test these out...


Solution

  • You can either use your last regex with an escaped . (note that a . matches any char but newline and a \. will match a literal dot char), or use DATA type for the last but one field and a GREEDYDATA for the last field:

    %{DATA:kafka_node}:type=%{DATA:kafka_metric_type},name=% {DATA:kafka_metric_name}\.%{GREEDYDATA:attr_type}
    

    Since %{DATA:name} translates to (?<name>.*?) and %{GREEDYDATA:name} translates to (?<name>.*), the name part will match any chars, 0 or more occurrences, as few as possible, up to the first ., and attr_type .* pattern will greedily "eat up" the rest of the line up to its end.