Search code examples
regexsedcharacter-class

Hyphen and underscore not compatible in sed


I'm having trouble getting sed to recognize both hyphen and underscore in its pattern string.

Does anyone know why

[a-z|A-Z|0-9|\-|_]

in the following example works like

[a-z|A-Z|0-9|_]

?

$  cat /tmp/sed_undescore_hypen
lkjdaslf lkjlsadjfl dfpasdiuy service-type = service-1; jaldkfjlasdjflk address = address1; kldjfladsf
lkjdaslf lkjlsadjfl dfasdf  service-type = service_1; jaldkfjlasdjflk address = address1; kldjfladsf

$  sed 's/.*\(service-type = [a-z|A-Z|0-9|\-|_]*\);.*\(address = .*\);.*/\1    \2/g' /tmp/sed_undescore_hypen
lkjdaslf lkjlsadjfl dfpasdiuy service-type = service-1; jaldkfjlasdjflk address = address1; kldjfladsf
service-type = service_1    address = address1

$  sed 's/.*\(service-type = [a-z|A-Z|0-9|\-]*\);.*\(address = .*\);.*/\1    \2/g' /tmp/sed_undescore_hypen
service-type = service-1    address = address1
lkjdaslf lkjlsadjfl dfasdf  service-type = service_1; jaldkfjlasdjflk address = address1; kldjfladsf

$  sed 's/.*\(service-type = [a-z|A-Z|0-9|_]*\);.*\(address = .*\);.*/\1    \2/g' /tmp/sed_undescore_hypen
lkjdaslf lkjlsadjfl dfpasdiuy service-type = service-1; jaldkfjlasdjflk address = address1; kldjfladsf
service-type = service_1    address = address1

Solution

  • As mentioned, you don't need anything to separate your ranges in a bracket expression. All that will do is adding | to the characters matched by the expression.

    Then, to add a hyphen, you can put it as the first or last character in the expression:

    [a-zA-Z0-9_-]
    

    And finally, ranges like a-z don't necessarily mean abcd...xyz, depending on your locale. You could use a POSIX character class instead:

    [[:alnum:]_-]
    

    Where [:alnum:] corresponds to all alphanumeric characters of your locale. In the C locale, it corresponds to 0-9A-Za-z.