Search code examples
regexlogstash-grok

Regex to skip first word and parse the rest of the message


I've been trying to get the right regex for skipping the first word and parsing the rest of the message.

I've been testing the regex by running Logstash locally

grok {
    match => { "resource" => "/[^/]+/[^/]+(/|)(?<repo>[^/]+)?(/%{GREEDYDATA:resource_path})?" }
      }

Test Messages:

/list/Lighter-test-group/xyz/123
/list/
/list

For messages,

/list/Lighter-test-group/xyz/123 gives us repo value as "Lighter-test-group" which is valid
/list/ gives us repo value as null which is valid
but /list gives repo value as "list" which is an invalid value. The correct value needs to be empty or null.

Solution

  • Not sure if you are restricted to using one really long regex but I would look into custom patterns to ignore the first word.

    Using this grok debugger, I setup some custom patterns in the 3rd box:

    IGNORE /\b\w+\b
    REPO [A-Za-z]([A-Za-z0-9+\-.]+)+
    

    And tested out this grok pattern in the 2nd box:

    %{IGNORE}(/)?(%{REPO:repo})?(%{GREEDYDATA:resource_path})
    

    Using these custom patterns, I was able to get what I think is your desired output but test them out with more use cases if you have any.