Search code examples
elasticsearchlogstashkibanaelastic-stacklogstash-grok

Grok pattern for data separated by pipe with whitespaces and optional values in it


I have a textfile/logfile in which the values are separated by a pipe symbol. "|" with multiple whitespaces.

Also I just wanted to try it without gsub.

An example is below,

Does anyone know how to write a GROK pattern to extract it for logstash? as I am very new to it. Thanks in advance

5000|       |       |applicationLog     |ClientLog      |SystemLog      |Green      |       |2014-01-07 11:58:48.76948      |12345 (0x1224)|1) Error 2)Sample Log | Configuration Manager

Solution

  • Since the number of | are inconsistent between different words, you can match it with .*? and extract rest of the data as predefined grok pattern

    %{NUMBER:num}.*?%{WORD:2nd}.*?%{WORD:3rd}.*?%{WORD:4th}.*?%{WORD:5th}.*?%{TIMESTAMP_ISO8601}
    

    which will give you,

    {
      "num": [
        [
          "5000"
        ]
      ],
      "BASE10NUM": [
        [
          "5000"
        ]
      ],
      "2nd": [
        [
          "applicationLog"
        ]
      ],
      "3rd": [
        [
          "ClientLog"
        ]
      ],
      "4th": [
        [
          "SystemLog"
        ]
      ],
      "5th": [
        [
          "Green"
        ]
      ],
      "TIMESTAMP_ISO8601": [
        [
          "2014-01-07 11:58:48.76948"
        ]
      ],
      "YEAR": [
        [
          "2014"
        ]
      ],
      "MONTHNUM": [
        [
          "01"
        ]
      ],
      "MONTHDAY": [
        [
          "07"
        ]
      ],
      "HOUR": [
        [
          "11",
          null
        ]
      ],
      "MINUTE": [
        [
          "58",
          null
        ]
      ],
      "SECOND": [
        [
          "48.76948"
        ]
      ],
      "ISO8601_TIMEZONE": [
        [
          null
        ]
      ]
    }
    

    You can test it at online grok debugger.

    Since you are new to grok you might want to read, grok filter plugin basics

    If you can, I'd suggest you to also have a look in dissect filter which is faster and efficient than grok,

    The Dissect filter is a kind of split operation. Unlike a regular split operation where one delimiter is applied to the whole string, this operation applies a set of delimiters to a string value. Dissect does not use regular expressions and is very fast. However, if the structure of your text varies from line to line then Grok is more suitable. There is a hybrid case where Dissect can be used to de-structure the section of the line that is reliably repeated and then Grok can be used on the remaining field values with more regex predictability and less overall work to do.