Search code examples
perlregex-negationone-liner

Inverse multi-line grep in perl one-liner


I have a text file with inconsistent formatting, but the relevant sections look like:

     CDS             complement(99074..99808)
                     /note="important in cell to cell spread of the virus, a
                     tegument protein"
                     /codon_start=1

As part of an existing bash pipeline, I need to remove the pattern of /note="anything" to get

     CDS             complement(99074..99808)
                     /codon_start=1

I've tried several methods to inverse grep, but the closest only works if the match is not spanning multiple lines:

perl -ne '/\/\bnote\b\="[^"]+"/||print' file.txt

I can match the strings I wish to remove by checking with the following perl one-liner, but so far I cannot combine the two methods to invert the match and remove the strings that span multiple lines:

perl -0777 -ne 'print "$1\n" while ( /(\s+\/\bnote\b\="[^"]+")/sg )' file.txt

Doing the first one-liner as -0777 results in no output.


Solution

  • The simple approach involves reading the entire stream into memory. This is done by telling Perl to treat the whole file as a single line using -0777 or the new -g.

    perl -0777pe's{^\s*/note="[^"]*"\n}{}mg'
    

    Doing it a line at a time is more complicated since it requires a flag to indicate whether we're in the string or not.

    perl -ne'
       $f ||= m{^\s*/note="};
       print if !$f;
       $f &&= !m{"$};
    '