Search code examples
jsonloggingdatadog

Datadog Grok Parsing - extracting fields from nested JSON


Is it possible to extract json fields that are nested inside a log?

Sample I've been work on:

thread-191555 app.main - [cid: 2cacd6f9-546d-41ew-a7ce-d5d41b39eb8f, uid: e6ffc3b0-2f39-44f7-85b6-1abf5f9ad970] Request: protocol=[HTTP/1.0] method=[POST] path=[/metrics] headers=[Timeout-Access: <function1>, Remote-Address: 192.168.0.1:37936, Host: app:5000, Connection: close, X-Real-Ip: 192.168.1.1, X-Forwarded-For: 192.168.1.1, Authorization: ***, Accept: application/json, text/plain, */*, Referer: https://google.com, Accept-Language: cs-CZ, Accept-Encoding: gzip, deflate, User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko, Cache-Control: no-cache] entity=[HttpEntity.Strict application/json {"type":"text","extract": "text", "field2":"text2","duration": 451 }

what I wanted to achieve was:

{
"extract": "text",
"duration": "451"
}

I tried to combine a sample regex ("(extract)"\s*:\s*"([^"]+)",?) with example_parser %{data::json} (using the JSON as a log sample data, for starters) but I haven't managed to get anything working.

Thanks in advance!


Solution

  • Is that sample text formatted properly? The final entity object is missing a ] from the end.

    entity=[HttpEntity.Strict application/json {"type":"text","extract": "text", "field2":"text2","duration": 451 }

    should be

    entity=[HttpEntity.Strict application/json {"type":"text","extract": "text", "field2":"text2","duration": 451 }]

    I'm going to continue these instructions assuming that was a typo and the entity field actually ends with ]. If it doesn't, I think you need to fix the underlying log to be formatted properly and close out the bracket.


    Instead of just skipping the entire log and only parsing out that json bit, I decided to parse the entire thing and show what would look good as a final result. So the first thing we need to do is pull out that set of key/value pairs after the request object:

    Example Input: thread-191555 app.main - [cid: 2cacd6f9-546d-41ew-a7ce-d5d41b39eb8f, uid: e6ffc3b0-2f39-44f7-85b6-1abf5f9ad970] Request: protocol=[HTTP/1.0] method=[POST] path=[/metrics] headers=[Timeout-Access: <function1>, Remote-Address: 192.168.0.1:37936, Host: app:5000, Connection: close, X-Real-Ip: 192.168.1.1, X-Forwarded-For: 192.168.1.1, Authorization: ***, Accept: application/json, text/plain, */*, Referer: https://google.com, Accept-Language: cs-CZ, Accept-Encoding: gzip, deflate, User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko, Cache-Control: no-cache] entity=[HttpEntity.Strict application/json {"type":"text","extract": "text", "field2":"text2","duration": 451 }]

    Grok parser rule: app_log thread-%{integer:thread} %{notSpace:file} - \[%{data::keyvalue(": ")}\] Request: %{data:request:keyvalue("=","","[]")}

    Result:

    {
      "thread": 191555,
      "file": "app.main",
      "cid": "2cacd6f9-546d-41ew-a7ce-d5d41b39eb8f",
      "uid": "e6ffc3b0-2f39-44f7-85b6-1abf5f9ad970",
      "request": {
        "protocol": "HTTP/1.0",
        "method": "POST",
        "path": "/metrics",
        "headers": "Timeout-Access: <function1>, Remote-Address: 192.168.0.1:37936, Host: app:5000, Connection: close, X-Real-Ip: 192.168.1.1, X-Forwarded-For: 192.168.1.1, Authorization: ***, Accept: application/json, text/plain, */*, Referer: https://google.com, Accept-Language: cs-CZ, Accept-Encoding: gzip, deflate, User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko, Cache-Control: no-cache",
        "entity": "HttpEntity.Strict application/json {\"type\":\"text\",\"extract\": \"text\", \"field2\":\"text2\",\"duration\": 451 }"
      }
    }
    

    app log parser

    Notice how we use the keyvalue parser with a quoting string of [], that allows us to easily pull out everything from the request object.


    Now the goal is to pull out the details from that entity field inside of the request object. With Grok parsers you can specify a specific attribute to parse further.

    So in that same pipeline we'll add another grok parser processor, right after our first

    enter image description here

    And then configure the advanced options section to run on request.entity, since that is what we called the attribute

    enter image description here

    Example Input: HttpEntity.Strict application/json {"type":"text","extract": "text", "field2":"text2","duration": 451 }

    Grok Parser Rule: entity_rule %{notSpace:request.entity.class} %{notSpace:request.entity.media_type} %{data:request.entity.json:json}

    Result:

    {
      "request": {
        "entity": {
          "class": "HttpEntity.Strict",
          "media_type": "application/json",
          "json": {
            "duration": 451,
            "extract": "text",
            "type": "text",
            "field2": "text2"
          }
        }
      }
    }
    

    Now when we look at the final parsed log it has everything we need broken out:

    enter image description here


    Also just because it was really simple, I threw in a third grok processor for the headers chunk as well (the advanced settings are set to parse from request.headers):

    Example Input: Timeout-Access: <function1>, Remote-Address: 192.168.0.1:37936, Host: app:5000, Connection: close, X-Real-Ip: 192.168.1.1, X-Forwarded-For: 192.168.1.1, Authorization: ***, Accept: application/json, text/plain, */*, Referer: https://google.com, Accept-Language: cs-CZ, Accept-Encoding: gzip, deflate, User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko, Cache-Control: no-cache

    Grok Parser Rule: headers_rule %{data:request.headers:keyvalue(": ", "/)(; :")}

    Result:

    {
      "request": {
        "headers": {
          "Timeout-Access": "function1",
          "Remote-Address": "192.168.0.1:37936",
          "Host": "app:5000",
          "Connection": "close",
          "X-Real-Ip": "192.168.1.1",
          "X-Forwarded-For": "192.168.1.1",
          "Accept": "application/json",
          "Referer": "https://google.com",
          "Accept-Language": "cs-CZ",
          "Accept-Encoding": "gzip",
          "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko",
          "Cache-Control": "no-cache"
        }
      }
    }
    

    The only tricky bit here is that I had to define a characterWhiteList of /)(; :. Mostly to handle all those special characters are in the User-Agent field.


    References:

    Just the documentation and some guess & checking in my personal Datadog account.

    https://docs.datadoghq.com/logs/processing/parsing/?tab=matcher#key-value-or-logfmt