regexperl

RegEx question- Perl- search for last instance of string


This is probably very simple for all of the regex experts out there, but I've spent enough time driving myself mad trying to find the answer on my own.

I use Doc Parser, which lets you create text parsing rules. You can search using a regular expression. The documentation says PERL regular expressions are supported, and that the Regex 101 site is a good place to test your expressions, but I have found in the past that expressions that work in the Regex 101 don't always seem to work in Doc Parser.

I am trying to create an expression that searches for the last instance of one of three strings. The three strings are:

i am sitting with after this meeting are
won't be included in your published notes
Single Signal

The input text can look three different ways, this is why I'm looking for one of three strings. Here are three examples:

Ex 1:

Single Signal

Two things I am sitting with after this meeting are...

- Words words words

Ex 2:

Single Signal

- words words words

Ex 3:

Single Signal

words words that end in won't be included in your published notes.

- words

The three phrases I'm capturing end up being the starting point for what I'm really pulling out of the text.

I have used this as my core/root expression:

(?i)(i am sitting with after this meeting are|This is for internal
use and won't be included in your published notes|Single Signal)

And have tried various things at the end of the expression to indicate matching what occurs last/latest in the text.

(?i)(i am sitting with after this meeting are|This is for internal
use and won't be included in your published notes|Single Signal).*?

(?i)(i am sitting with after this meeting are|This is for internal
use and won't be included in your published notes|Single Signal)+

(?i)(i am sitting with after this meeting are|This is for internal
use and won't be included in your published notes|Single Signal){1}

This worked in Regex 101, PCRE2, but didn't work in Doc Parser (Perl):

(?i)[^(i am sitting with after this meeting are|won't be included in your published notes|Single Signal)]+$

All help is greatly appreciated. Thank you!


Solution

  • If you prefix your initial regex with .* (or perhaps .*\K), there can be at most one match, so the correct value should be captured.

    $ perl -e '
        $t = "1a 2a 3a 1b 2b 1c 3c 2d";
        $t =~ /.*(1.|2.|3.)/;
        print "matched $1\n" 
    '
    matched 2d
    $
    

    You may need to adjust so the prefix also capture newlines.