Search code examples
javascriptregexsedcapture-group

Regex capture group works in Javascript and regex101, but not in sed


In regex101: https://regex101.com/r/FM88LA/1

enter image description here

In my browser console:

x='"AbCd123|999"';
"\"AbCd123|999\""
x.match(/[^\""|]+/)
Array [ "AbCd123" ]

Using sed in the shell:

(base) balter@winmac:~/winhome/CancerGraph/TCGA$ echo '"AbCd123|99999"' | sed -En 's/([^\"|]+)/\1/p'
"AbCd123|99999"
(base) balter@winmac:~/winhome/CancerGraph/TCGA$ echo '"AbCd123|99999"' | sed -En 's/\"([^|]+)/\1/p'
AbCd123|99999"

Solution

  • That is all fine, because sed command used with -n option and p flag only prints the text that was not matched + the result of the successful replacement.

    That means, you can get your "match" with

    echo '"AbCd123|99999"' | sed -En 's/["|]*([^"|]+).*/\1/p'
    

    See the online demo.

    Here, the pattern gets to the first char that is not " and | with ["|]*, then the ([^"|]+) part captures one or more chars other than " and |, and then .* matches the rest of the string.

    Everything that was matched but not captured is removed as you only ask to print the \1, the Group 1 value (captured with ([^"|]+)).