Search code examples
pythonjsonsyslogpyparsing

Syslog with JSON Data Parsing using PyParsing


I want to make a Syslog parser for me to transform my Syslog, which has JSON information in a key=value format and the output file to be a .txt for me to import into FortiSIEM, which is really picky with the compatible syslogs, and I can't get to working parsing the "Original" syslog, thus this idea for me to simplify the log before it reaches the SIEM.

I have made some testing with PyParsing but I really don't know how to use it, My output file is being created but it's coming out blank

I think I can't share the syslog, so here a very rough example on how the syslog looks like:

<140>1 2022-05-02T08:31:22.478Z platform dataexport - syslog_variation - {"key"=value, info:{"key"=value, "key"=value, "key"=value}, info2:{"key"=value, "key"=value},"key"=value}

The script that I have come up with:

from pyparsing import Word, Suppress, alphanums, CharsNotIn, ZeroOrMore, Dict

# Define header
priority = Suppress("<") + Word(alphanums) + Suppress(">")
version = Word(alphanums) + Suppress(" ")
timestamp = CharsNotIn(" ") + Suppress(" ")
hostname = CharsNotIn(" ") + Suppress(" ")
appname = CharsNotIn(" ") + Suppress(" ")
procid = CharsNotIn(" ") + Suppress(" ")
msgid = CharsNotIn("\n")
header = priority + version + timestamp + hostname + appname + procid + msgid

# Define key-value pairs
key = Word(alphanums + "_")
value = CharsNotIn("\n")
pair = key + Suppress("=") + value
kv_pairs = Dict(pair + ZeroOrMore(Suppress(",") + pair))

# Define message format
message = header + Suppress(" ") + kv_pairs

# Open input and output files
with open("syslog.txt") as input_file, open("syslog_output.txt", "w") as output_file:
    for line in input_file:
        try:
            # Convert to key-value format
            parsed_message = message.parseString(line.strip())
            kv_message = " ".join([f"{key}={value}" for key, value in parsed_message.items()])

            # Write the message to the output file
            output_file.write(parsed_message + "\n")
        except Exception as e:
            print(f"Failed to parse line: {line} with error: {e}")

            continue

I get 2 Exceptions when I run the script and I printed the header and message outputs:

Failed to parse line: "Whole Syslog Text"
 with error: Expected ' ', found '2022'  (at char 7), (line:1, col:8)

Failed to parse line: 
 with error: Expected '<'  (at char 0), (line:1, col:1)

Header:  {Suppress:('<') W:(0-9A-Za-z) Suppress:('>') W:(0-9A-Za-z) Suppress:(' ') !W:( ) Suppress:(' ') !W:( ) Suppress:(' ') !W:( ) Suppress:(' ') !W:( ) Suppress:(' ') !W:(
)}

Message:  {Suppress:('<') W:(0-9A-Za-z) Suppress:('>') W:(0-9A-Za-z) Suppress:(' ') !W:( ) Suppress:(' ') !W:( ) Suppress:(' ') !W:( ) Suppress:(' ') !W:( ) Suppress:(' ') !W:(
) Suppress:(' ') Dict:({W:(0-9A-Z_a-z) Suppress:('=') !W:(
) [{Suppress:(',') W:(0-9A-Z_a-z) Suppress:('=') !W:(
)}]...})}

I want to my output_file to look like this:

<140>1 2022-05-02T08:31:22.478Z platform dataexport - syslog_variation -
key=value
key=value
key=value
...

I need to have the header for me to identify which type of log is on FortiSIEM.


Solution

  • As I mentioned in the comment, pyparsing skips whitespace by default, so all the + Suppress(" ") terms should be removed.

    CharsNotIn is an exception to the whitespace-skipping rule, I find Word(printables) works better.

    I replaced your timestamp, hostname, etc. terms with Word(printables), as this:

    timestamp = Word(printables)
    hostname = Word(printables)
    appname = Word(printables)
    procid = Word(printables)
    msgid = rest_of_line
    header = priority + version + timestamp + hostname + appname + '-' + procid + '-' + msgid
    

    I used this code to test the parser:

    header.run_tests("""\
        <140>1 2022-05-02T08:31:22.478Z platform dataexport - syslog_variation - {"key"=value, info:{"key"=value, "key"=value, "key"=value}, info2:{"key"=value, "key"=value},"key"=value}
        """)
    

    and got this:

    <140>1 2022-05-02T08:31:22.478Z platform dataexport - syslog_variation - {"key"=value, info:{"key"=value, "key"=value, "key"=value}, info2:{"key"=value, "key"=value},"key"=value}
    ['140', '1', '2022-05-02T08:31:22.478Z', 'platform', 'dataexport', '-', 'syslog_variation', '-', ' {"key"=value, info:{"key"=value, "key"=value, "key"=value}, info2:{"key"=value, "key"=value},"key"=value}']
    

    You'll have to refine your definition of the key-value pairs. Use pyparsing's QuotedString('"') for the key, since it is some value in quotes. For value, you'll need to be more careful to just read up to the next comma or }, not all the way to the \n at end of line.