Search code examples
regexshellsedposix

Strange sed behaviour


I have this POSIX compliant shell script. It takes a delimited string w.r.t. | and prepends a - to substrings if they are a single character in length:

#!/bin/sh
printf '%s\n' "k|k|jill|hill|k" | sed 's/\([|]\|^\)\([[:alnum:]]\)\([|]\|$\)/\1-\2\3/g'

This outputs:

-k|k|jill|hill|-k

Notice it doesn't account for the k sandwiched between two delimiters (I.e., |k|).

Even more strangely, if I change the special characters in the original snippet to anything else, it does prepend a - (note the changes: ^ to something; $ to different), but obviously not to the first and last k's:

#!/bin/sh
printf '%s\n' "k|k|jill|hill|k" | sed 's/\([|]\|something\)\([[:alnum:]]\)\([|]\|different\)/\1-\2\3/g'

Outputs:

k|-k|jill|hill|k

At first I thought that it was because the $ and ^ positional characters weren't optional. However they obviously are optional for $ in the first flag and ^ in the last flag of the first example.

I'm very curious to know, why is this not working and can I do what I want to with a similar sed expression?


Solution

  • Note that if you change the sed script from a global search and replace to a loop, you can get your desired output:

    printf '%s\n' "k|k|jill|hill|k" | sed 's/\([|]\|^\)\([[:alnum:]]\)\([|]\|$\)/\1-\2\3/g'
    
    -k|k|jill|hill|-k
    

    versus

    printf '%s\n' "k|k|jill|hill|k" | sed '
        :a
        s/\([|]\|^\)\([[:alnum:]]\)\([|]\|$\)/\1-\2\3/
        ta
    '
    
    -k|-k|jill|hill|-k
    

    ref: https://www.gnu.org/software/sed/manual/html_node/Programming-Commands.html