My plan is to extract group of words from a string with regex. However, I have sometimes the word NOT
in front of a word which should be extracted. Not sure how to deal with that issue.
Test string:
tag=os index=linux index=windows NOT index=mac tag=db index="a_something-else" NOT index=solaris
Current (failing) regex expression:
index=(\")?(?<my_indexes>\w+(-)?(\w+)?)(\")?
This regex expression is extracting all index=zyx
words. But the case with the NOT
e.g. NOT index=mac
or NOT index=solaris
should be avoided. E.g. the results should be like:
index=linux
index=windows
index="a_something-else"
Any suggestions?
As you meantion that it is PCRE, one option is to use a SKIP FAIL pattern, and use a capturing group with a backreference to pair up the matching double quote.
Then you can make the double quote optional inside the capturing group and refer to it using \1
and \2
note that you don't have to escape the double quote by itself.
\bNOT\h+index=("?)\w+(?:-\w+)*\1(*SKIP)(*FAIL)|index=("?)\w+(?:-\w+)*\2
Explanation
\bNOT\h+
Match NOT and 1+ horizontal whitespace charsindex=("?)
Match index=
And capture an optional "
in group 1\w+(?:-\w+)*\1
Match 1+ word chars, optionally repeated by -
and 1+ word chars. Then a backreference to what is captured in group 1(*SKIP)(*FAIL)|
Skip the matchindex=("?)
Match index=
And capture an optional "
in group 2\w+(?:-\w+)*\2
The same as the previous pattern above, now with a backreference to group 2If you don't want the double quotes around a_something-else
and only want the value after the =
, you could use another capturing group, or use the named capturing group my_indexes
\bNOT\h+index=("?)\w+(?:-\w+)*\1(*SKIP)(*FAIL)|index=("?)(?<my_indexes>\w+(?:-\w+)*)\2