I have found a few examples on this, but none that work exactly how I would like.
I would like to delete everything between 1 and several other possible patterns, but not including the patterns themselves. The pattern pairs are per line only, not across multiple lines.
eg
:1543453 Brown Fox
:789 123456 Cat
:abcdef Yellow Duck
to
:Brown Fox
:Cat
:Yellow Duck
So the first pattern to match is the ":" and the second being "Brown" OR "Cat" OR "Yellow"
There's brute force and ignorance, which works well on occasion:
sed -e 's/^:.* Brown/:Brown/' \
-e 's/^:.* Cat/:Cat/' \
-e 's/^:.* Yellow/:Yellow/' \
data-file.txt
You might be able to use 'extended regular expressions' with the -E
(BSD, Mac, Linux) or -r
(Linux only) options:
sed -E 's/^:.* (Brown|Cat|Yellow)/:\1/' data-file.txt
Both produce the desired output on the sample data.
Note that the .*
used is 'greedy'. Given the input file:
:1543453 Brown Fox
:789 123456 Cat
:abcdef Yellow Duck
:quantum mechanics eat Yellow Ducks for being yellow (but leave Yellow Dafodils alone)
both the scripts produce:
:Brown Fox
:Cat
:Yellow Duck
:Yellow Dafodils alone)
You'd need Perl or a sed
enhanced with PCRE (Perl-Compatible Regular Expressions), or some other program, to avoid the greediness. For example:
$ perl -n -e 'print if s/^:.*? (Brown|Cat|Yellow)/:\1/' data-file.txt
:Brown Fox
:Cat
:Yellow Duck
:Yellow Ducks for being yellow (but leave Yellow Dafodils alone)
$