Search code examples
regexlogstashmultilinelogstash-groklogstash-configuration

Correct ELK multiline regular expression?


I am newbie to ELK and i'm writing a config file which uses multiline and we need to write a pattern for input data

110000|read|<soapenv:Envelope>
<head>hello<head>
<body></body>
</soapenv:Envelope>|<soapenv:Envelope>
<body></body>
</soapenv:Envelope>
210000|read|<soapenv:Envelope>
<head>hello<head>
<body></body>
</soapenv:Envelope>|<soapenv:Envelope>
<body></body>
</soapenv:Envelope>
370000|read|<soapenv:Envelope>
<head>hello<head>
<body></body>
</soapenv:Envelope>|<soapenv:Envelope>
<body></body>
</soapenv:Envelope>

and config file used is :

input {
  file {
    path => "/opt/test5/practice_new/xml_input.dat"
     start_position => "beginning"
        codec => multiline
  {
   pattern => "^%{INT}\|%{WORD}\|<soapenv:Envelope*>\|<soapenv"
   negate => true
   what => "previous"
  }
  }
}
filter {
  grok {
    match => [ "message", "%{DATA:method_id}\|%{WORD:method_type}\|%{GREEDYDATA:request}\|%{GREEDYDATA:response}" ]
  }
}

output {
   elasticsearch {
     hosts => "http://localhost:9200"
     index => "xml"
  }
stdout {}
}

But the pattern used in it does not match for my requirement.

please suggest me the correct pattern.

Expected output :

For 1st log

method_id- 110000

method type-

request-

response-

For 2nd log

 method id- 210000

    method type-

    request-

    response-

similarly for the rest.


Solution

  • First of you'll have to fix your multiline pattern:

    codec => multiline {
                pattern => "^%{NUMBER:method_id}\|%{DATA:method_type}\|<soapenv:Envelope>"
                negate => true
                what => previous
            }
    

    Afterwards you can use the pattern Wiktor suggests in the comments:

    (?m)^(?<method_id>\d+)\|(?<method_type>\w+)\|(?<request><soapenv:Envelope>.*?</soapenv:Envelope>)\|(?<response><soapenv:Envelope>.*?</soapenv:Envelope>)
    

    Following results for the three log lines in your post on http://grokconstructor.appspot.com: results


    Your whole config might look like this:

    input {
      file {
        path => "/opt/test5/practice_new/xml_input.dat"
        start_position => "beginning"
        codec => multiline {
                pattern => "^%{NUMBER:method_id}\|%{DATA:method_type}\|<soapenv:Envelope>"
                negate => true
                what => previous
            }
      }
    }
    filter {
      grok {
        match => [ "message", "(?m)^(?<method_id>\d+)\|(?<method_type>\w+)\|(?<request><soapenv:Envelope>.*?</soapenv:Envelope>)\|(?<response><soapenv:Envelope>.*?</soapenv:Envelope>)" ]
      }
    }
    
    output {
       elasticsearch {
         hosts => "http://localhost:9200"
         index => "xml"
      }
    stdout {}
    }