With Grok Debuger I am trying to parse some custom data:
1 1 "Device 1" 1 "Input 1" 0 "On" "Off" "2020-01-01T00:00:00.1124303+00:00"
So far I have:
%{INT:id} %{INT:device} %{QUOTEDSTRING:device_name} %{INT:input} %{QUOTEDSTRING:input_name} %{INT:state} %{QUOTEDSTRING:on_phrase} %{QUOTEDSTRING:off_phrase} \"%{TIMESTAMP_ISO8601:when}\"
However, I am getting things like double quotes around strings %{QUOTEDSTRING)
, and two lots of hours and minutes with the time and date %{TIMESTAMP_ISO8601:when}
{
"id": [
[
"1"
]
],
"device": [
[
"1"
]
],
"device_name": [
[
""Device 1""
]
],
"input": [
[
"1"
]
],
"input_name": [
[
""Input 1""
]
],
"state": [
[
"0"
]
],
"on_phrase": [
[
""On""
]
],
"off_phrase": [
[
""Off""
]
],
"when": [
[
"2020-01-01T00:00:00.1124303+00:00"
]
],
"YEAR": [
[
"2020"
]
],
"MONTHNUM": [
[
"01"
]
],
"MONTHDAY": [
[
"01"
]
],
"HOUR": [
[
"00",
"00"
]
],
"MINUTE": [
[
"00",
"00"
]
],
"SECOND": [
[
"00.1124303"
]
],
"ISO8601_TIMEZONE": [
[
"+00:00"
]
]
}
Also, I am a little stuck when it comes to the logstash.conf
as I am not sure what I would put as the index
in the output
. The following code is from a previous example from github:
input {
beats {
port => 5044
}
}
filter {
grok {
match => { "message" => "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}" }
}
}
output {
elasticsearch {
hosts => "elasticsearch:9200"
manage_template => false
index => "sample-%{+YYYY.MM.dd}"
}
}
I'm guessing mine would look something like this:
input {
beats {
port => 5044
}
}
filter {
grok {
match => { "message" => "%{INT:id} %{INT:device} %{QUOTEDSTRING:device_name} %{INT:input} %{QUOTEDSTRING:input_name} %{INT:state} %{QUOTEDSTRING:on_phrase} %{QUOTEDSTRING:off_phrase} \"%{TIMESTAMP_ISO8601:when}\"" }
}
}
output {
elasticsearch {
hosts => "elasticsearch:9200"
manage_template => false
index => "sample-%{????????}"
}
}
Again I'm unclear as to what I am supposed to do with "sample-%{????????}"
In regard to the double-double-quotes: just use DATA instead of QUOTEDSTRING:
"%{DATA:device_name}"
Duplicated entries in the hours and minutes come from the timezone: first entry is the actual hour, the second one is the hour of the timezone. Same for the minutes.
To get rid of it you would need a custom pattern:
"(?<when>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?(?<ISO8601_TIMEZONE>Z|[+-](?:2[0123]|[01]?[0-9])(?::?(?:[0-5][0-9])))?)"
(if you are not interested in parsing the timestamp at all, just use DATA again).
So, your pattern might look like this:
%{INT:id} %{INT:device} "%{DATA:device_name}" %{INT:input} "%{DATA:input_name}" %{INT:state} "%{DATA:on_phrase}" "%{DATA:off_phrase}" "(?<when>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?(?<ISO8601_TIMEZONE>Z|[+-](?:2[0123]|[01]?[0-9])(?::?(?:[0-5][0-9])))?)"
Regarding index:
logstash-%{+YYYY.MM.dd}
sample-%{+YYYY.MM.dd}
if you want to have separate indexes for each daysample-
to have just one index