Search code examples
rsyslog

Is there a way parse log messages using rsyslog config and transform them to structured messages?


I am trying to parse log messages and transform them to structured messages using rsyslog. Is there a way support such operation with rsyslog config? I have not yet explored the option to write custom parser or message modification plugin for this.

I found template list properties which can do some of it. Is there a way to do the following?

  1. Map 2 fields to single output name. Ex: "__ts": "2018-09-20 10:18:56.363" (first 2 fields in example below). Would not use regex here as I am looking for a solution that does not depend on value of the fields. Ex: the two fields could be two strings or some other values not just dates.
  2. Extract what is left in msg after extracting all known fields based on position. Ex: "msg": "Unregistering application nameOfAnApiHere with someOtherName with status DOWN".
  3. Is there a way to use local variables to hold the values of fields from msg and use the variables in templates?

Example Log message:

2018-09-20 10:18:56.363 INFO --- [Thread-68] x.y.z.key1Value Unregistering application nameOfAnApiHere with someOtherName with status DOWN

1. rsyslog config template definition

template(name="structure-log-format" type="list") {
  constant(value="{")

  # This only extracts the first field with value 2018-09-20.
  # TODO: What is a way to map first 2 fields to map to __ts field? 
  property(outname="__ts" name="msg" field.number="1" field.delimiter="32" format="jsonf") constant(value=", ")

  constant(value="\"event\":[{")
    constant(value="\"payload\":{")
        property(outname="_log_" name="syslogtag" format="jsonf") constant(value=", ")
        property(outname="__loglvl" name="msg" field.number="4" field.delimiter="32" format="jsonf") constant(value=", ")
        property(outname="__thread" name="msg" field.number="7" field.delimiter="32" format="jsonf") constant(value=", ")
        property(outname="__key1" name="msg" field.number="8" field.delimiter="32" format="jsonf") constant(value=", ")
        # The following setting will include full message value starting from "2018-09-20 ... DOWN"
        # TODO: What is a way to only include message starting from "Unregistering ... DOWN"?
        property(name="msg" format="jsonf" droplastlf="on" )
    constant(value="}")
constant(value="}]} \n")

}

2. Expected result:

{
   "__ts": "2018-09-20 10:18:56.363",
   "event": [
       {
          "payload": {
             "_log_": "catalina",
             "__loglvl": "INFO",
             "__thread": "Thread-68",
             "__key1": "x.y.z.key1Value",
             "msg": "Unregistering application nameOfAnApiHere with someOtherName with status DOWN"
          }
       }
     ]
}

3. Actual result:

{
   "__ts": "2018-09-20",
   "event": [
       {
          "payload": {
             "_log_": "catalina",
             "__loglvl": "INFO",
             "__thread": "Thread-68",
             "__key1": "x.y.z.key1Value",
             "msg": "2018-09-20 10:18:56.363  INFO 2144 --- [Thread-68] x.y.z.key1Value Unregistering application nameOfAnApiHere with someOtherName with status DOWN"
          }
       }
     ]
}

Thank you.


Solution

  • You can also use regular expressions to match parts of a message. For example, replace your outname="__ts" property with:

     property(outname="__ts" name="msg" 
      regex.expression="([^ ]+ +[^ ]+)" 
      regex.type="ERE" 
      regex.submatch="1" format="jsonf")
    

    Here the extended regular expression (ERE) looks for not-a-space ([^ ]) one or more of them (+), followed by a space or more, and another not-a-space. These 2 words are captured as a submatch by the () and you select this one, counting from 1. The result should be as you want.

    You can similarly use a regex for the second requirement, either by counting "words" and spaces again, or some more precise other match. Here the regex skips 6 words by putting a repeat count {6} after the word-and-spaces pattern, then captures the rest (.*). Since there are 2 sets of (), the submatch to keep is now 2, not 1:

     property(name="msg" 
      regex.expression="([^ ]+ +){6}(.*)" 
      regex.type="ERE" 
      regex.submatch="2" format="jsonf" droplastlf="on" )