Search code examples
stringsedscriptingmultiline

What is different about these two pairs of strings that makes this sed script with one and not the other?


This question is related to this other question I asked earlier today: Find and replace text with all-inclusive wild card

I have a text file like this

I want= to keep this
        This is some <text> I want to keep <and "something" in tags that I" want to keep> aff FOO1 WebServices and some more "text" that" should "</be> </deleted>
        <this is stuff in tags I want=to begone> and other text I want gone too. </this is stuff in tags I want to begone> 
       A novice programmer walked into a "BAR2" descript keepthis
        and this even more text, let's keep it
    <I actually want this>
    and this= too.`

when I use sed -f script.sed file.txt to run this script:

# Check for "aff"
/\baff\b/    {   
# Define a label "a"
:a  
# If the line does not contain "desc"
/\bdesc\b/!{
# Get the next line of input and append
    # it to the pattern buffer
    N
    # Branch back to label "a"
    ba
}   
# Replace everything between aff and desc
s/\(\baff\)\b.*\b\(desc\b\)/\1TEST DATA\2/
}

I get this as my output:

       I want= to keep this
        This is some <text> I want to keep <and "something" in tags that I" want to keep> aff FOO1 WebServices and some more "text" that" should "</be> </deleted>
        <this is stuff in tags I want=to begone> and other text I want gone too. </this is stuff in tags I want to begone> 
       A novice programmer walked into a "BAR2" descript keepthis
        and this even more text, let's keep it
    <I actually want this>
    and this= too.

However, by simply changing the search strings from aff and desc to FOO1 and BAR2:

   # Check for "FOO1"
/\bFOO1\b/    {   
# Define a label "a"
:a  
# If the line does not contain "BAR2"
/\bBAR2\b/!{
# Get the next line of input and append
    # it to the pattern buffer
    N
    # Branch back to label "a"
    ba
}   
# Replace everything between FOO1 and BAR2
s/\(\bFOO1\)\b.*\b\(BAR2\b\)/\1TEST DATA\2/
}

gives the expected output:

I want= to keep this
This is some <text> I want to keep <and "something" in tags that I" want to keep> aff FOO1TEST DATABAR2" descript keepthis
    and this even more text, let's keep it
<I actually want this>
and this= too.`

I am completely stumped about what is going on here. Why should searching between FOO1 and BAR2 work differently from the exact same script with aff and desc?


Solution

  • The end marker should be \bdesc instead of \bdesc\b.

    Note the \b in the pattern, it matches a word boundary. Your above text contains the word description, but not desc.

    Your previous question made me assume that you want that. If you don't care about word boundaries, remove the \b escape sequences completely.