I have this POSIX compliant shell script. It takes a delimited string w.r.t. |
and prepends a -
to substrings if they are a single character in length:
#!/bin/sh
printf '%s\n' "k|k|jill|hill|k" | sed 's/\([|]\|^\)\([[:alnum:]]\)\([|]\|$\)/\1-\2\3/g'
This outputs:
-k|k|jill|hill|-k
Notice it doesn't account for the k sandwiched between two delimiters (I.e., |k|
).
Even more strangely, if I change the special characters in the original snippet to anything else, it does prepend a -
(note the changes: ^
to something
; $
to different
), but obviously not to the first and last k's:
#!/bin/sh
printf '%s\n' "k|k|jill|hill|k" | sed 's/\([|]\|something\)\([[:alnum:]]\)\([|]\|different\)/\1-\2\3/g'
Outputs:
k|-k|jill|hill|k
At first I thought that it was because the $
and ^
positional characters weren't optional. However they obviously are optional for $
in the first flag and ^
in the last flag of the first example.
I'm very curious to know, why is this not working and can I do what I want to with a similar sed expression?
Note that if you change the sed script from a global search and replace to a loop, you can get your desired output:
printf '%s\n' "k|k|jill|hill|k" | sed 's/\([|]\|^\)\([[:alnum:]]\)\([|]\|$\)/\1-\2\3/g'
-k|k|jill|hill|-k
versus
printf '%s\n' "k|k|jill|hill|k" | sed '
:a
s/\([|]\|^\)\([[:alnum:]]\)\([|]\|$\)/\1-\2\3/
ta
'
-k|-k|jill|hill|-k
ref: https://www.gnu.org/software/sed/manual/html_node/Programming-Commands.html