Search code examples
regexgrafana-lokipromtail

Regex, Grafana Loki, Promtail: Parsing a timestamp from logs using regex


I want to parse a timestamp from logs to be used by loki as the timestamp.
Im a total noob when it comes to regex.

The log file is from "endlessh" which is essentially a tarpit/honeypit for ssh attackers.

It looks like this:

2022-04-03 14:37:25.101991388  2022-04-03T12:37:25.101Z CLOSE host=::ffff:218.92.0.192 port=21590 fd=4 time=20.015 bytes=26
2022-04-03 14:38:07.723962122  2022-04-03T12:38:07.723Z ACCEPT host=::ffff:218.92.0.192 port=64475 fd=4 n=1/4096

What I want to match, using regex, is the second timestamp present there, since its a utc timestamp and should be parseable by promtail.

I've tried different approaches, but just couldn't get it right at all.

So first of all I need a regex that matches the timestamp I want.
But secondly, I somehow need to form it into a regex that exposes the value in some sort? The docs offer this example:

.*level=(?P<level>[a-zA-Z]+).*ts=(?P<timestamp>[T\d-:.Z]*).*component=(?P<component>[a-zA-Z]+)

Afaik, those are named groups, and that is all that it takes to expose the value for me to use it in the config?

Would be nice if someone can provide a solution for the regex, and an explanation of what it does :)


Solution

  • You could for example create a specific pattern to match the first part, and capture the second part:

    ^\d{4}-\d{2}-\d{2} \d\d:\d\d:\d\d\.\d+\s+(?P<timestamp>\d{4}-\d{2}-\d{2}T\d\d:\d\d:\d\d\.\d+Z)\b
    

    Regex demo

    Or use a very broad if the format is always the same, repeating an exact number of non whitespace characters parts and capture the part that you want to keep.

    ^(?:\S+\s+){2}(?<timestamp>\S+)
    

    Regex demo