In a multiline string, in each line, I want to delete everything from the first unescaped percent sign to the end of the line; with one exception. If the unescaped percent sign occurs in the following position: \d\d:\d\d%:\d\d
, then I want to leave it alone.
(The string is LaTeX / TeX code and the percent sign denotes a comment. I want to treat a comment inside an HH:MM:SS string as a special case, where seconds were commented out of a time string.)
The code below manages almost to do it:
\%
alone%
\d\d:\d\d%
\d\d:\d\d%anything
and \d\d:\d\d%\d\d
, skipping both.#!/usr/bin/perl
use strict; use warnings;
my $string = 'for 10\% and %delete-me
for 10\% and 2021-03-09 Tue 02:59%:02 NO DELETE %delete-me
for 10\% and 2021-03-09 Tue 04:09%anything %delete-me
for 10 percent%delete-me';
print "original string:\n";
print "$string<<\n";
{
my $tochange = $string;
$tochange =~ s/
(^.*?
(?<!\\)
)
(\%.*)
$/${1}/mgx;
print "\ndelete after any unescaped %\n";
print "$tochange<<\n";
}
{
my $tochange = $string;
$tochange =~ s/
(^.*?
(?<!\d\d:\d\d)
(?<!\\)
)
(\%.*)
$/${1}/mgx;
print "\nexception for preceding HH:MM\n";
print "$tochange<<\n";
}
{
my $tochange = $string;
$tochange =~ s/
(^.*?
(?<!\d\d:\d\d)
(?<!\\)
)
(!?:\d\d)
(\%.*)
$/${1}/mgx;
print "\nattempt to add negative lookahead\n";
print "$tochange<<\n";
}
{
my $tochange = $string;
# attempt to add negative lookahead
$tochange =~ s/
(^.*?
(?<!\d\d:\d\d)
(?<!\\)
)
(\%.*)
(!?:\d\d)
$/${1}/mgx;
print "\nattempt to add negative lookahead\n";
print "$tochange<<\n";
}
You might make use of SKIP FAIL approach:
\d\d:\d\d%:\d\d(*SKIP)(*FAIL)|(?<!\\)%.*
\d\d:\d\d%:\d\d(*SKIP)(*FAIL)|
Match the pattern that you want to avoid(?<!\\)%.*
Negative lookbehind, assert not \
directly to the left and match %
followed by the rest of the lineFor example
$tochange =~ s/\d\d:\d\d%:\d\d(*SKIP)(*FAIL)|(?<!\\)%.*//g;