Search code examples
regexperl

perl regex negative lookahead replacement with wildcard


Updated with real and more complicated task:

The problem is to substitute a certain pattern to different result with or without _ndm.

The input is a text file with certain line like:

/<random path>/PAT1/<txt w/o _ndm>
/<random path>/PAT1/<txt w/ _ndm>

I need change those to

/<ramdom path>/PAT1/<sub1>
/<random path>/PAT1/<sub2>_ndm

I wrote a perl command to process it:

perl -i -pe 's#PAT1/.*_ndm#<sub2>_ndm#; s#PAT1/.*(?!_ndm)#<sub1>#' <input_file>

However, it doesn't work as expected. the are all substituted to instead of _ndm.


Original post:

I need use shell command to replace any string not ending with _ndm to another string (this is an example):

abc
def
def_ndm

to

ace
ace
def_ndm

I tried with perl command

perl -pe 's/.*(?!_ndm)/ace/'

However I found wildcard didn't work with negative lookahead as my expected. Only if I include wildcard in negative pattern, it can skip def_ndm correctly; but because negative lookahead is a zero length one, it can't replace normal string any more.

any idea?


Solution

  • Daisy chained matches

    Force PAT1 to be the parent directory and then see whether the final name ends in _ndm. Note the need to squirrel away values of match variables $1 and $2 into $a and $b because the second match (against /_ndm$/) wipes out the previous capture variables.

    while (<>) {
      chomp;
      if (s#(/PAT1/)([^/]+)$#($a,$b) = ($1, $2); $a . ($b =~ /_ndm$/ ? "sub2_ndm" : "sub1")#e) {
        print $_, "\n";
      }
      else {
        print "No match: $_\n";
      }
    }
    

    Try it online.

    Check which alternative matched

    Find PAT1 as above and then give two alternatives for how the match can succeed. Note the use of named patterns to avoid having to count left-parentheses and a fixed-width negative look-behind for the _ndm case. Strictly speaking, only one of the look-behind assertions is necessary, but you may prefer keeping both to emphasize what’s happening.

    while (<>) {
      chomp;
      if (s{(?<prefix>/PAT1/) (?: (?<ndm>[^/]+)(?<=_ndm) | (?<nondm>[^/])+(?<!_ndm) )$}
           {$+{prefix} . (exists $+{nondm} ? "sub1" : "sub2_ndm")}ex)
      {
        print $_, "\n";
      }
      else {
        print "No match: $_\n";
      }
    }
    

    Try it online.

    Output

    Given input of

    /foo/bar/PAT1/veeblefetzer
    /baz/PAT1/quux_ndm
    /nope/nope
    /also/nope/PAT1/dir/dir/dir_ndm
    

    both output

    /foo/bar/PAT1/sub1
    /baz/PAT1/sub2_ndm
    No match: /nope/nope
    No match: /also/nope/PAT1/dir/dir/dir_ndm