Search code examples
regexregex-greedy

Regex match last occurrence of a string from multiple lines


I'm trying to match the last occurrences of a string from a log file.

[03/03/2019 09:16:36] Moving message 123456789 from NEW to PENDING
[03/03/2019 09:16:36] Retrieving file(s) of type DATAWAREHOUSE for 123456
[03/03/2019 09:16:36] collecting warehouse version 7.3.1 files for 123456...
[03/03/2019 09:16:37] Moving message 123456789 from NEW to PENDING
[03/03/2019 09:16:37] Retrieving file(s) of type DATAWAREHOUSE for 123456
[03/03/2019 09:16:37] collecting warehouse version 7.3.1 files for 123456...
[03/03/2019 09:16:38] Moving message 123456789 from NEW to PENDING
[03/03/2019 09:16:39] Retrieving file(s) of type DATAWAREHOUSE for 123456
[03/03/2019 09:16:40] collecting warehouse version 7.3.1 files for 123456...

Above is the sample log file from which there are three occurrences of the below string,

Moving message 123456789 from NEW to PENDING

I need to match the last occurrence to get the respective timestamp "[03/03/2019 09:16:38]". But when all these are in a single line using greedy approach (.*) it works fine. But when they are present in multiple lines it isn't working. I haven't tried multiline (m) as I'm not sure how to use it. Can someone please help me construct the regex query to retrive this last occurrence timestamp? Example: https://regex101.com/r/fnwPsB/1


Solution

  • Here is a solution that is not dependent on PCRE feature using negative lookahead:

    (?s)\[(\d{2}\/\d{2}\/\d{4} \d{2}:\d{2}:\d{2})\] Moving message 123456789 from NEW to PENDING(?!.* Moving message 123456789 from NEW to PENDING)
    

    RegEx Demo

    Date-time is available in 1st capture group.

    Here (?!.* Moving message 123456789 from NEW to PENDING) is negative lookahead that ensures we match very last occurrence of given pattern.