Search code examples
regexlogstashlogstash-grok

Logstash Grok match to last index unti begin of UserAgent


I have this log message:

"sid-cmascioieiow89322&New*Sou,th%20Skvn%20and%20ir&o,n%20Age,Mozilla/5.0 (Linux; Android 6.0; CHM-U01 Build/HonorCHM-U01) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.98 Mobile Safari/537.36"

And the pattern:

"(?[^&])&(?[^,]),%{GREEDYDATA:User_Agent}"

The problem is p2 sometimes contains zero or one or more then one comma. I want to match to the last comma before UserAgent because UserAgent some time contains commas.

This is the grok debugger link: https://grokdebug.herokuapp.com/

Now:

{
    "p1": [
        "sid-cmascioieiow89322"
    ],
    "p2": [
        "New*Sou"
    ],
    "User_Agent": [
        "th%20Skvn%20and%20iro,n%20Age,Mozilla/5.0 (Linux; Android 6.0; CHM-U01 Build/HonorCHM-U01) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.98 Mobile Safari/537.36"
    ]
}

I want like this:

{
    "p1": [
        "sid-cmascioieiow89322"
    ],
    "p2": [
        "New*Sou,th%20Skvn%20and%20ir&o,n%20Age"
    ],
    "User_Agent": [
        "Mozilla/5.0 (Linux; Android 6.0; CHM-U01 Build/HonorCHM-U01) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.98 Mobile Safari/537.36"
    ]
}

Thank you for your help.


Solution

  • The part of string that you want to capture into p2 part has no whitespaces. Thus, instead of a [^,]* pattern that matches any zero or more chars other than , you may use \S* - any 0+ non-whitespace chars as many as possible, thus \S*, will match the comma that is the last in the streak of non-whitespace chars.

    (?<p1>[^&]*)&(?<p2>\S*),%{GREEDYDATA:User_Agent}
                 ^^^^^^^^^^
    

    This is how this regex matches your log data:enter image description here

    See the Grok demo screenshot: enter image description here