Search code examples
aws-gluegrok

How to parse application log without order and structure using Grok


I'm parsing application log using Grok, testing with https://grokconstructor.appspot.com/do/match.

The log presents like below:

2023-04-01 02:00:00,007 [nioEventLoopGroup-13-13] INFO {"deviceid":"aaaaaaaaaa","userAgent":"device"}
2023-04-01 02:00:01,234 [nioEventLoopGroup-13-13] INFO {"userAgent":"device","deviceid":"bbbbbbbbbb"}
2023-04-01 02:00:02,234 [nioEventLoopGroup-13-13] INFO {"userAgent":"device"}

My Grok pattern:

%{GENERATE_TIME:generateTime}.*?%{DEVICEID:deviceId}.*?%{AGENT:userAgent}

Custom pattern:

GENERATE_TIME \d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}
DEVICEID deviceid":"(.{10})
AGENT "userAgent":"(.*?)"

Output: output

Expected Output:

[
    {
        "generateTime": "2023-04-01·02:00:00,007",
        "deviceId": "aaaaaaaaaa",
        "userAgent": "device"
    },
    {
        "generateTime": "2023-04-01·02:00:01,234",
        "deviceId": "bbbbbbbbbb",
        "userAgent": "device"
    },
    {
        "generateTime": "2023-04-01·02:00:02,234",
        "userAgent": "device"
    }
]

It seems that there are two problems to solve, how to match deviceId and userAgent cleanly, how to parse log without order.

Thanks in advance.


Solution

  • Could you try the below grok pattern and feedback?

    filter
    {
    grok
    {
    match => 
    {
    "message" => ['%{TIMESTAMP_ISO8601:timestamp} %{GREEDYDATA:event}] %{LOGLEVEL:loglevel} %{DATA:deviceid}:%{DATA:id},%{DATA:useragent}:"%{DATA:agentname}"
    ', '%{TIMESTAMP_ISO8601:timestamp} %{GREEDYDATA:event}] %{DATA:loglevel} %{DATA:useragent}:"%{DATA:agentname}"']
    }
    }
    }