Search code examples
perlregex-lookaroundsnegative-lookbehind

how can I perform both negative lookahead and negative lookbehind in a single perl regx?


In a multiline string, in each line, I want to delete everything from the first unescaped percent sign to the end of the line; with one exception. If the unescaped percent sign occurs in the following position: \d\d:\d\d%:\d\d, then I want to leave it alone.

(The string is LaTeX / TeX code and the percent sign denotes a comment. I want to treat a comment inside an HH:MM:SS string as a special case, where seconds were commented out of a time string.)

The code below manages almost to do it:

  1. it uses one negative lookbehind to leave \% alone
  2. it uses "ungreedy" to match the first, not last, %
  3. it uses another negative lookbehind to skip \d\d:\d\d%
  4. BUT it fails to differentiate between \d\d:\d\d%anything and \d\d:\d\d%\d\d, skipping both.
  5. My attempts at adding negative lookahead do not help. Is there a way to do this?
#!/usr/bin/perl
use strict; use warnings;

my $string = 'for 10\% and %delete-me
for 10\% and 2021-03-09 Tue 02:59%:02 NO DELETE %delete-me
for 10\% and 2021-03-09 Tue 04:09%anything  %delete-me
for 10 percent%delete-me';

print "original string:\n";
print "$string<<\n";

{
    my $tochange = $string;
    $tochange =~ s/
        (^.*?
        (?<!\\)
        )
        (\%.*)
        $/${1}/mgx;
    print "\ndelete after any unescaped %\n";
    print "$tochange<<\n";
}

{
    my $tochange = $string;
    $tochange =~ s/
        (^.*?
        (?<!\d\d:\d\d)
        (?<!\\)
        )
        (\%.*)
        $/${1}/mgx;
    print "\nexception for preceding HH:MM\n";
    print "$tochange<<\n";
}

{
    my $tochange = $string;
    $tochange =~ s/
        (^.*?
        (?<!\d\d:\d\d)
        (?<!\\)
        )
        (!?:\d\d)
        (\%.*)
        $/${1}/mgx;
    print "\nattempt to add negative lookahead\n";
    print "$tochange<<\n";
}


{
    my $tochange = $string;
    # attempt to add negative lookahead
    $tochange =~ s/
        (^.*?
        (?<!\d\d:\d\d)
        (?<!\\)
        )
        (\%.*)
        (!?:\d\d)
        $/${1}/mgx;
    print "\nattempt to add negative lookahead\n";
    print "$tochange<<\n";
}


Solution

  • You might make use of SKIP FAIL approach:

    \d\d:\d\d%:\d\d(*SKIP)(*FAIL)|(?<!\\)%.*
    
    • \d\d:\d\d%:\d\d(*SKIP)(*FAIL)| Match the pattern that you want to avoid
    • (?<!\\)%.* Negative lookbehind, assert not \ directly to the left and match % followed by the rest of the line

    Regex demo | Perl demo

    For example

    $tochange =~ s/\d\d:\d\d%:\d\d(*SKIP)(*FAIL)|(?<!\\)%.*//g;