Search code examples
sed

multiple lines selection in sed with custom delimiter, why do I need to escape twice?


What I want to do

  • I want to parse a file to remove all lines before and after given patterns A and B (excluding pattern A and B).
  • I also want to use a custom delimiter instead of /.

What I tried

Deleting lines

Inspired by this question, the following does give the expected output

'/A/,/B/{//!d}'

Replacing the delimiter

Now, I want to modify my answer to use another delimiter than /; let say | instead.

I would write the following

'\|B|,|F|{//!d}'

but this fails with the following error message

sed: -e expression #1, char 6: unexpected `,'

Escaping the delimiter

Escaping the third | fixes my issue.

'\|B|,\|F|{//!d}'

The question

Why can't I freely replace the delimiter in this solution

'\|B|,|F|{//!d}'

such that I had to do that instead

'\|B|,\|F|{//!d}'

Solution

  • why do I need to escape twice?

    You do not escape anything, this is the syntax of sed language. The context address, which is part of sed language, has the syntax:

    /<regexp>/
    

    or

    \C<regexp>C
    

    where C is any single character. See for exmaple POSIX standard https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sed.html section Regular Expressions in sed:

    In a context address, the construction "\cBREc", where c is any character other than <backslash> or <newline>, shall be identical to "/BRE/". If the character designated by c appears following a <backslash>, then it shall be considered to be that literal character, which shall not terminate the BRE. For example, in the context address "\xabc\xdefx", the second x stands for itself, so that the BRE is "abcxdef".

    The \ is part of the context address expression syntax, so has to be everywhere where such address expression form is used. There is no "escaping", it is just part of the sed language syntax.

    If you want to use two addresses with two context addresses and both of them addresses use the form with a custom delimiter, they just both have to be in the form of \C<regexp>C.

    If it is no trouble, can you complete your answer to include a quick explanation regarding why the syntax of s|a|b| seemingly does not require the delimiter character to be reintroduced for the second pattern ?

    sed was invented the way it is by Lee E. McMahon, or rather ed was invented the way it is by the prophet Ken Thompson. There is no particular "why", rather "it is what it is". POSIX in this case only standardizes existing sed implementations.

    Quoting again from the POSIX spec:

    [2addr]s/BRE/replacement/flags

    Substitute the replacement string for instances of the BRE in the pattern space. Any character other than <backslash> or <newline> can be used instead of a <slash> to delimit the BRE and the replacement. Within the BRE and the replacement, the BRE delimiter itself can be used as a literal character if it is preceded by a <backslash>.