I am trying to understand the DATA
pattern in grok plugin of logstash.
As per the documentation DATA
matches as following:
DATA .*?
--> I interpreted it as anything with length 1 to n [Please correct me if my understanding is wrong].
In my script, it fails to parse my input properly.
Logstash conf:
input{
file {
path => ["/home/osboxes/logstash_conf/mydir/test_logs/*"]
start_position => beginning
sincedb_path => "/home/osboxes/logstash_conf/mydir/.sincedb"
}
}
filter{
grok {
match => { "message" => "^%{TIMESTAMP_ISO8601:timeStamp},%{DATA:ID},%{DATA:somedata}" }
}
}
output {
stdout {
codec => json_lines
}
}
Input:
2017-01-09 02:00:03.887,a,a
Output:
{
"message": "2017-01-09 02:00:03.887,a,a",
"@version": "1",
"@timestamp": "2017-01-09T12:28:20.958Z",
"path": "/home/osboxes/logstash_conf/mydir/test_logs/data",
"host": "osboxes",
"timeStamp": "2017-01-09 02:00:03.887",
"ID": "a"
}
I expected the tag somedata
will be filled with value [as it did for tag ID
] but it is omitted from the output. Anyone please help me understanding the behavior of DATA
pattern.
.*?
Matches between zero and unlimited times, as few times as possible, expanding as needed. The fact that it can match zero times is the reason you don't see a result.
Written out, the problematic part looks like this:
,(.*?),(.*?)
( Capture groups added for readability)
This matches : ,a,
1.Take the first ,
and match it.
2.Try to match .*?
with as few as possible (character by character until pattern is valid) this matches the a
3.Try to match the next ,
. This suceeds so the first .*?
is done.
4.Try to match .*?
. Since this can match zero times it will do so and the matching is complete.
The simple solution to your problem is to add a $
at the end of your pattern. $
is a end of string anchor so your second .*?
is forced to match the other a
.