I have the following line of text, where I am trying to extract everything up to the first pipe character that is not enclosed in square brackets.
action=search sourcetype=audittrail [ localop | stats count | eval search_id = replace("$top10_drilldown_sid$", "^remote_[^_]*_", "") | table search_id ] [ localop | stats count | eval earliest = $top10_drilldown_earliest$ - 86400 | table earliest ] latest="$top10_drilldown_latest$" | stats values(savedsearch_name) AS search_name
Expected output:
action=search sourcetype=audittrail [ localop | stats count | eval search_id = replace("$top10_drilldown_sid$", "^remote_[^_]*_", "") | table search_id ] [ localop | stats count | eval earliest = $top10_drilldown_earliest$ - 86400 | table earliest ] latest="$top10_drilldown_latest$"
i.e. Everything but the trailing | stats values(savedsearch_name) AS search_name
Following some lookaround examples, I could (nearly) get what I needed using a JavaScript Regex expression
/.*\|(?![^\[]*\])/g
But this didn't translate well into a PCRE-compatible expression that worked (plus I want to capture everything up to, but not including, the first pipe).
From what I've read, the nested square brackets in the first bracketed set may be a complication that can't be worked around? There would only be one level of nested brackets in any given set (e.g. [..[]..]
or [..[]..[]..]
)
I admit that I don't think I've got my head fully around positive & negative lookarounds, but any help would be greatly appreciated!
In this kind of situation, it's more efficient to match all that isn't the delimiter than trying to split:
(?=[^|])[^][|]*(?:(\[[^][]*+(?:(?1)[^][]*)*+])[^][|]*)*
details:
(?=[^|]) # lookahead: ensure there's at least one non pipe character at the
# current position, the goal is to avoid empty match.
[^][|]* # all that isn't a bracket or a pipe
(?:
( # open the capture group 1: describe a bracket part
\[
[^][]*+ # all that isn't a bracket (note that you don't have to care
# about of the pipe here, you are between brackets)
(?:
(?1) # refer to the capture group 1 subpattern (it's a recursion
# since this reference is in the capture group 1 itself)
[^][]*
)*+
]
) # close the capture group 1
[^][|]*
)*
If you need empty parts too, you can rewrite it like this:
(?=[^|])[^][|]*(?:(\[[^][]*+(?:(?1)[^][]*)*+])[^][|]*)*|(?<=\|)