This question is related to this other question I asked earlier today: Find and replace text with all-inclusive wild card
I have a text file like this
I want= to keep this
This is some <text> I want to keep <and "something" in tags that I" want to keep> aff FOO1 WebServices and some more "text" that" should "</be> </deleted>
<this is stuff in tags I want=to begone> and other text I want gone too. </this is stuff in tags I want to begone>
A novice programmer walked into a "BAR2" descript keepthis
and this even more text, let's keep it
<I actually want this>
and this= too.`
when I use sed -f script.sed file.txt
to run this script:
# Check for "aff"
/\baff\b/ {
# Define a label "a"
:a
# If the line does not contain "desc"
/\bdesc\b/!{
# Get the next line of input and append
# it to the pattern buffer
N
# Branch back to label "a"
ba
}
# Replace everything between aff and desc
s/\(\baff\)\b.*\b\(desc\b\)/\1TEST DATA\2/
}
I get this as my output:
I want= to keep this
This is some <text> I want to keep <and "something" in tags that I" want to keep> aff FOO1 WebServices and some more "text" that" should "</be> </deleted>
<this is stuff in tags I want=to begone> and other text I want gone too. </this is stuff in tags I want to begone>
A novice programmer walked into a "BAR2" descript keepthis
and this even more text, let's keep it
<I actually want this>
and this= too.
However, by simply changing the search strings from aff
and desc
to FOO1
and BAR2
:
# Check for "FOO1"
/\bFOO1\b/ {
# Define a label "a"
:a
# If the line does not contain "BAR2"
/\bBAR2\b/!{
# Get the next line of input and append
# it to the pattern buffer
N
# Branch back to label "a"
ba
}
# Replace everything between FOO1 and BAR2
s/\(\bFOO1\)\b.*\b\(BAR2\b\)/\1TEST DATA\2/
}
gives the expected output:
I want= to keep this
This is some <text> I want to keep <and "something" in tags that I" want to keep> aff FOO1TEST DATABAR2" descript keepthis
and this even more text, let's keep it
<I actually want this>
and this= too.`
I am completely stumped about what is going on here. Why should searching between FOO1
and BAR2
work differently from the exact same script with aff
and desc
?
The end marker should be \bdesc
instead of \bdesc\b
.
Note the \b
in the pattern, it matches a word boundary. Your above text contains the word description, but not desc.
Your previous question made me assume that you want that. If you don't care about word boundaries, remove the \b
escape sequences completely.