Search code examples
regexlogstash-grok

Regex Pattern for below url


Can someone give regex pattern for below example url ?

https://test.example.com/new/index.html?token=0A44AD94

If there is a possibility to split up the field, then I could easily populate those value for monitor each user activity.


Solution

  • You can use the following regex to match those kinds of URLs with optional third-party component and where params always follow the fixed order:

    https?://(?<domain>[^/]*).*\btoken=(?<token>[^&]*).*\bvalue=(?<value>[^&]*)(?:.‌​*\bexit=(?<thirdparty>[^&]*))?
                                                                                ^^                             ^^
    

    Note that [^/]* matches 0 or more characters other than /, [^&]* matches 0 or more characters other than & (which is helpful when matching param values in the query string so as not to overmatch). (?:...)? is an optional non-capturing group that can be present or not, but the regex will still succeed.

    UPDATE:

    After checking a few things, I think this regex will work for you:

    %{IPORHOST:clientip} (%{USER:ident}|-) (%{USER:auth}|-) \[%{HTTPDATE:timestamp}\] (?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-) "%{URIPROTO}://(?<domain>[^/]*).*[&?]token=(?<token>[^&]*).*[&?]value=(?<value>[‌​^&]*)(?:.*[&?]exit=(?<thirdparty>[^"&]*))?"(?:\s*%{QS:agent})?
    

    For some reason, {QS:agent} already includes double quotation marks.