Search code examples
fieldlogstashgrok

Logstash Grok filter - naming the fields according to content


I have a question regarding the grok filter in Logstash. Assume that I have two log messages as below:

06 Oct 2014 15:49:23,256 DEBUG [http-8080-1] (com.webratio.units.content.rtx.db.PowerIndexUnitService:45) - [8C590C7717CB12BE96A83F23DA9EE56B][page21][pwu5][trace][127.0.0.1,8C590C7717CB12BE96A83F23DA9EE56B] RESULT_COUNT:2 {oid=[9, 8]}

06 Oct 2014 15:49:23,270 DEBUG [http-8080-1] (com.webratio.units.utility.rtx.db.SelectorUnitService:45) - [8C590C7717CB12BE96A83F23DA9EE56B][page21][seu13][trace][127.0.0.1,8C590C7717CB12BE96A83F23DA9EE56B] RESULT_COUNT:0 {}

My filter is as follows:

filter {
    grok {
            match => [ "message", "%{INT:day} %{MONTH:month} %{YEAR:year} %{TIME:time} %{SPACE} %{WORD:mode} \[%{DATA:http}\] %{SPACE} \(%{DATA:path}\) - \[%{DATA:sessionId}\]\[%{DATA:pageId}\]\[%{DATA:pwuId}\]\[%{DATA:trace}\]%{GREEDYDATA:Info}" ]
    }   
}

As you might guess the filter matches both of the log messages. However seu13 in the second log message is also named as pwuId. Do you know if there is a way to check inside of the field and give an appropriate name accordingly?


Solution

  • You would have to do something like this with grok:

    grok { 
      match => [ "message", 
        "%{INT:day} %{MONTH:month} %{YEAR:year} %{TIME:time} %{SPACE} %{WORD:mode} [%{DATA:http}] %{SPACE} (%{DATA:path}) - [%{DATA:sessionId}][%{DATA:pageId}][(?<pwuId>pwu\d+)}][%{DATA:trace}]%{GREEDYDATA:Info}",
        "%{INT:day} %{MONTH:month} %{YEAR:year} %{TIME:time} %{SPACE} %{WORD:mode} [%{DATA:http}] %{SPACE} (%{DATA:path}) - [%{DATA:sessionId}][%{DATA:pageId}][(?<seuId>seu\d+)}][%{DATA:trace}]%{GREEDYDATA:Info}" 
      ] 
    }
    

    So now you've got two different patterns that match -- the first one only matches the pwu####'s and the 2nd only matches seu####'s