Search code examples
regexpcresplunkpositive-lookahead

Regex to only return matches when followed by a specific word


I have a fairly long json string that contains a number of results from different automation playbooks. Here is a short snippet:

"name":"update_notable_hash","responses":null,"status":"success","type":"generic"},{"action":"add work note","action_run_id":103524,"app_runs":null,"callback":{"cb_called":"yes","cb_fn_name":"hunt_file_1","cb_result":true},"close_time":"2024-05-13T08:31:54.231577+00:00","create_time":"2024-05-13T08:31:54.229+00:00","id":103524,"name":"add_work_note_2","responses":null,"status":"failed","type":""}

The two fields I care about are "name" and "status", and there are a variable number of field:value pairs in between them. There are two results in the snippet - the first was successful ("status":"success") and the second failed ("status":"failed"). I want to capture the name only when it is followed by "status":"failed".

This is as close as I've managed to get:

\"name\":\"(?<failed_block_name>[^\"]+).+?(?=\"status\":\"failed\")

It does see the "status":"failed" part correctly, but then goes all the way back to the beginning and captures the first "name" value. I thought that by making this greedy it would only look back to the closest "name", but it doesn't.

Any suggestions would be appreciated, because I'm really stuck on this one.

UPDATE 2024/06/05:

While the solution by @tripleee worked great for 99% of my events, I just ran across one which included quotation marks in the name. Since the capture group was using the quote as a negated character match, it fails.
So instead of [^\"] in the first capture group, I need something that will capture up to ", instead. I tried using a non-capture group like [^(?:\",)] but no luck


Solution

  • As ever, don't say .+? if that's not what you mean.

    This allows "field":"value", repetitions but nothing else before "status":

    "name":"(?<failed_block_name>[^\"]+)",(?:"[^"]+":"[^"]+",)*"status":"failed"
    

    I'm guessing you didn't really need to backslash the double quotes. I took out the lookahead, too; just match the whole thing but capture only the part you want to extract.