Search code examples
regexchartssplunk-query

How to extract time field from a message string in splunk?


I'm trying to plot a visualization chart in splunk. My sample splunk log looks like as shown below:

 {TIMESTAMP=2024-04-02 02:26:58 , LEVEL=INFO , APPL=appname ,ENV=test , THREAD=[http-nio-8080-exec-6] , Execution Stopped: StudentController | Total Time Taken: 52 ms}

I'm trying to extract Total Time Taken field from this log and trying to plot a graph in splunk for each timestamp.

Tried using rex command :

index = "xyz"| rex "Total Time Taken:(?<Total Time Taken>\S)" 

But facing, Error in 'rex' command: Encountered the following error while compiling the regex 'Total Time Taken:(?<Total Time Taken>\S)': Regex: syntax error in subpattern name (missing terminator).

Can anyone please help me out by providing the splunk query to extract the specified field with a timechart.


Solution

  • I don't know Splunk so I don't know what the regex engine can do. I'll give you several answers:

    Using named capturing groups like in your question:

    • I don't think you are allowed to put spaces in the capturing group name, so remove your spaces or replace them by underlines.

    • I also see a missing space after the : of the label. If this can vary, then replace the space I put in my answer by * or by \s*.

    • \S will match the first number only. You may also need the unit, in case it changes. I can imagine that if the time is more than a second you might get a log value such as "3.45 s" instead of "3450 ms". But this, you might have to figure out in all your log values. So you could replace \S by [^,}]+, meaning "not a comma or closing brace char, once or several times". If you leave \S, at least, you must add the + sign after it, so that it matches one or several times a non-space char.

    This would become:

    Total Time Taken: (?<TotalTimeTaken>[^,}]+)
    

    Test it here: https://regex101.com/r/llI4yQ/1

    If you need to separate the value and the unit, then you could change it to something like:

    Total Time Taken: (?<value>[\d.]+)\s*(?<unit>[^,\s}]+)
    
    • [\d.]+ means "a digit or a dot, once or several times".

    • \s* would match optional spaces, just in case the unit is written directly after the digits and dots.

    • [^,\s}]+ means "not a comma, space or closing brace, once or several times".

    Test it here with several examples: https://regex101.com/r/llI4yQ/4

    Without capturing groups, but with lookarounds

    If you can't use capturing groups then you can use a lookbehind, if the regex engine supports it. The idea is to not capture the label, but only the value, if it is prefixed by the label:

    (?<=Total Time Taken: )[^,}]+
    

    Test it here: https://regex101.com/r/llI4yQ/6

    Splunk Documentation

    I see in the Splunk documentation page about rex that it doesn't use the PCRE engine but the RE2 engine.

    So, you can't use negative lookbehinds, but you can use named capturing groups.

    I also see that you can use two notations. So this means that my regexs above should be written between normal slashes, like this:

    rex field=xyz /Total Time Taken: (?<TotalTimeTaken>[^,}]+)/
    

    This should return a field called "TotalTimeTaken" with the value and the unit.