Search code examples
regexlogginglogstash-grokdatadoggrok

Grok pattern to extract specific data only from log


I am trying to extract some specific data from the postgresql logs using the grok parsing rules in datadog. I am trying to extract the following in json format from the logs below

{
   dbuser {
     AROAXXXXXXXXXXXXXXXXX : username
     }
}

Logs from which I am trying to extract the above information

2022-11-11 09:09:15 UTC:10.116.0.244(57888):AROAXXXXXXXXXXXXXXXXX:username@database_name:[592]:LOG:  AUDIT: SESSION,3016,1,READ,SELECT,,,"/*pga4dash*/
2022-11-11 09:20:53 UTC:10.116.0.244(57946):AROAXXXXXXXXXXXXXXXXX:username@database_name:[7696]:LOG:  pam_authenticate failed: Permission denied 
2022-11-11 09:27:02 UTC:10.116.0.244(57984):AROAXXXXXXXXXXXXXXXXX:username@database_name:[8328]:LOG:  AUDIT: SESSION,1,1,ROLE,ALTER ROLE,,,ALTER USER app_user SET pgaudit.log TO 'NONE';,<not logged>
2022-11-11 09:21:57 UTC:10.117.0.98(44764):AROAXXXXXXXXXXXXXXXXX:username@database_name:[2873]:FATAL:  pg_hba.conf rejects connection for host "10.117.0.98", user "AROAXXXXXXXXXXXXXXXXX:username", database "database_name", SSL off
*   Trying 127.0.0.1:1108...
* Connected to rdsauthproxy (127.0.0.1) port 1108 (#0)
> POST /authenticateRequest HTTP/1.1
Host: rdsauthproxy:1108
Accept: */*
Content-Length: 1884
Content-Type: multipart/form-data; boundary=------------------------1b12ee5d61245d84
* We are completely uploaded and fine
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Content-Type: text/html;charset=utf-8
< Content-Length: 0
< 
* Connection #0 to host rdsauthproxy left intact

What I have tried and achieved so far

This is what I have tried so far in terms of generalisation which should work on all logs but this gives me no output.

%{date("yyyy-MM-dd HH:mm:ss z"):}\:%{ipv4:}\(%{number:}\)\:%{data:dbuser:keyvalue(":")}

If I use the following then it gives me the desired output but only works for first pattern of log that I have mentioned above.

%{date("yyyy-MM-dd HH:mm:ss z"):}\:%{ipv4:}\(%{number:}\)\:%{data:dbuser:keyvalue(":")}:\[592\]\:LOG\:\s+AUDIT\:\s+SESSION,%{integer:},1,READ,SELECT,,,\"%{notSpace:}%{data}

If there is a way to ignore all the logs and only just extract the exact match then please help me out.


Solution

  • So I was able to figure out the solution for the above question. The following is the parse rule I used which help me achieve what I wanted.

    %{date("yyyy-MM-dd HH:mm:ss z"):}\:%{ipv4:}\(%{number:}\)\:%{word}:%{data:database.username}:%{data}