Search code examples
bashsedbsd

Sed delete everything between 2 patterns, but not including the patterns


I have found a few examples on this, but none that work exactly how I would like.

I would like to delete everything between 1 and several other possible patterns, but not including the patterns themselves. The pattern pairs are per line only, not across multiple lines.

eg

:1543453 Brown Fox
:789 123456 Cat
:abcdef Yellow Duck

to

:Brown Fox
:Cat
:Yellow Duck

So the first pattern to match is the ":" and the second being "Brown" OR "Cat" OR "Yellow"


Solution

  • There's brute force and ignorance, which works well on occasion:

    sed -e 's/^:.* Brown/:Brown/' \
        -e 's/^:.* Cat/:Cat/' \
        -e 's/^:.* Yellow/:Yellow/' \
        data-file.txt
    

    You might be able to use 'extended regular expressions' with the -E (BSD, Mac, Linux) or -r (Linux only) options:

    sed -E 's/^:.* (Brown|Cat|Yellow)/:\1/' data-file.txt
    

    Both produce the desired output on the sample data.

    Note that the .* used is 'greedy'. Given the input file:

    :1543453 Brown Fox
    :789 123456 Cat
    :abcdef Yellow Duck
    :quantum mechanics eat Yellow Ducks for being yellow (but leave Yellow Dafodils alone)
    

    both the scripts produce:

    :Brown Fox
    :Cat
    :Yellow Duck
    :Yellow Dafodils alone)
    

    You'd need Perl or a sed enhanced with PCRE (Perl-Compatible Regular Expressions), or some other program, to avoid the greediness. For example:

    $ perl -n -e 'print if s/^:.*? (Brown|Cat|Yellow)/:\1/' data-file.txt
    :Brown Fox
    :Cat
    :Yellow Duck
    :Yellow Ducks for being yellow (but leave Yellow Dafodils alone)
    $