Search code examples
regexregex-group

How to extract complete event using regex in one capturing group


I have following events from which I am trying to extract "loc" field

loc=/abc/flows/timespan/2021-08-10T11:35:00+00:00_2021-08-10T12:35:00+00:00/ip_initiate/10.101.10.20/data.ext

loc=\"{\"type\":1,\"namespace\":\"flows\",\"appIds\":\"10,11,12\",\"path_bar\":\"[\\\"ip_initiate=10.1.120.11\\\"]\",\"2021-08-10T11:35:00+00:00_2021-08-10T12:35:00+00:00\\/ip_initiate\\/10.1.120.11\\/http_code\\/200\",\"restrict\":null}\"", day=xyz

loc=\"{\"type\":1,\"namespace\":\"flows\",\"appIds\":\"10,11,12\",\"path_bar\":\"[\\\"ip_initiate=10.1.120.11\\\"]\",\"2021-08-10T11:35:00+00:00_2021-08-10T12:35:00+00:00\\/ip_initiate\\/10.1.120.11\\/http_code\\/200\",\"restrict\":null}\"", ip=10.10.10.10

loc=\"/timespan/2021-09-12T14:21:00/ip_responder/10.10.10.10/,country=xyz,dns=example.com,http:code=2548:111:0:0:0:0:182.25.236.2:10\"

I am able to extract it successfully with below regex using multiple capturing groups -

    (loc=(.*),\s)|(loc=(.*?)$)

https://regex101.com/r/dyWR2g/1

I wanted to know whether or not it will be possible to extract the complete "loc" field in just one group and if yes, then what changes will I need to make in my above regex.

Basically I don't want the pipe(|)in my regex.

Thanks in advance


Solution

  • You can put the ,\s and the $ in its own group separated by | if you dont want different group numbers.

    The value of loc is now in capture group 1.

    loc=(.*?)(?:,\s|$)
    

    The pattern matches:

    • loc= Match literally
    • (.*?) Capture group 1, match as least as possible
    • (?:,\s|$) Non capture group, match either , and a whitespace char or end of string

    Regex demo

    Without the | char at all, you could use

    loc=((?:(?!\s,).)*)
    

    Regex demo