Search code examples
regexregex-groupsplunk

Match regex named group up until optional word


I have these strings that I want to grab some information from them using a Regex.

Vuln : Upgrade pypi://onnx from 1.12.0 to 1.13.0 Final

Vuln : Upgrade gav://com.google.guava:guava from 22.0 to 32.0.0-android for github.com/blah/blah (Need to capture gav://com.google.guava:guava 22.0 32.0.0-android)

Vuln : Upgrade gav://org.apache.avro:avro from 1.11.1 for Android to 1.11.3 Final for github.com/blah/blah (Need to capture gav://org.apache.avro:avro 1.11.1 for Android 1.11.3 Final)

I specifically just need to grab the strings pypi://onnx 1.12.0 1.13.0 Final for example which are the library and version names since I'm using Splunk and the capture groups can become variables, all of these strings are dynamic, they will not always be what it is above.

I've been having difficulty crafting a regex that stops the moment the for is encountered, since it can be optional.

This is the one I've tried

Vuln : Upgrade\s*(?<vulnNameFromTo>.+)\sfrom\s*(?<vulnCurrentVersionFromTo>.+)\sto\s(?<vulnFixVersionFromTo>.+)(?:\sfor\s)?

But the last named capture group, grabs everything and the for and that's not what I want.


Solution

  • You can use

    Vuln : Upgrade\s*(?<vulnNameFromTo>.*?)\s+from\s+(?<vulnCurrentVersionFromTo>.*?)\s+to\s+(?<vulnFixVersionFromTo>.+?)(?:\s+for\b.*)?$
    

    See the regex demo.

    Details:

    • Vuln : Upgrade - a literal text
    • \s* - zero or more whitespaces
    • (?<vulnNameFromTo>.*?) - Group "vulnNameFromTo": one or more chars other than line break chars as few as possible
    • \s+from\s+ - from string enclosed with one or more whitespaces
    • (?<vulnCurrentVersionFromTo>.*?) - Group "vulnCurrentVersionFromTo": one or more chars other than line break chars as few as possible
    • \s+to\s+ - to word enclosed with one or more whitespaces
    • (?<vulnFixVersionFromTo>.+?) - Group "vulnFixVersionFromTo": any one or more chars other than line break chars as few as possible
    • (?:\s+for\b.*)? - an optional sequence of one or more whitespaces, for, a word boundary and then any zero or more chars other than line break chars as many as possible
    • $ - end of string.