Search code examples
regexlogstash-grok

Logstash grok pattern to extract a part of String starts with and ends with


I am trying to extract the application Id.

As an example i need to extract

application_1621858977521_0074

from the following log line

 /yarn/container-logs/application_1621858977521_0074/container_1621858977521_0074_01_000004 [2021-06-08 05:40:06,231] INFO Changing view acls groups to:  (org.apache.spark.SecurityManager)

I have tried the following custom grok pattern but doesn't work.

%{(^application_:/$):appID}%

I Appreciate your suggestions and help


Solution

  • You can use

    /(?<applicationId>application(?:_[0-9]+)+)/
    

    Quick alternatives are:

    /(?<applicationId>application(?:_\w+)+)/
    /(?<applicationId>application_[^/]*)
    

    See the regex demo.

    The pattern (with / added as path separators):

    • / - a / char
    • (?<applicationId>application(?:_[0-9]+)+) - Group "applicationId": application and then one or more repetitions of _ and one or more digits
    • / - a / char
    • [^/]* - zero or more chars other than /.

    The Grok debugger shows the following output with your given string:

    {
      "applicationId": [
        [
          "application_1621858977521_0074"
        ]
      ]
    }