Search code examples
stringsedmatchcharacterexcept

bash sed/awk/perl: removing a group of characters except when it matches specific strings


  • The goal is to remove a group of alphanumeric characters (including '_', '.' and '-') when they appear before the second colon (:) except when that group matches http[s]*.
  • The second colon must also be removed.
  • Another constraint is that nothing must be done if the third field (the one after the second colon) contains at least one colon.

For instance, the following list...:

- name_1: name_11:value-1
  name_2: value-2
  name_3: http://value-3
- name_4: https://value-4
  name_5: name_51:value-5
  name_6: value-61:value-62:value-63

... must be transformed into:

- name_1: value-1
  name_2: value-2
  name_3: http://value-3
- name_4: https://value-4
  name_5: value-5
  name_6: value-61:value-62:value-63

The following sed command removes all second "name" fields, including when they match 'http[s]*':

sed -E 's|([[:blank:]-]+[[:alnum:]_\.-]+:[[:blank:]]+)[[:alnum:]_\.-]+:([^:]+)$|\1\2|g' file
- name_1: value-1
  name_2: value-2
  name_3: //value-3
- name_4: //value-4
  name_5: value-5
  name_6: value-61:value-62:value-63

Any suggestion?


Solution

  • Use an alternation ((https?:)|[[:alnum:]_.-]+:) that captures http: or https: :

    sed -E 's/([[:blank:]-]+[[:alnum:]_.-]+:[[:blank:]]+)((https?:)|[[:alnum:]_.-]+:)([^:]+)$/\1\3\4/g' file