Search code examples
bashparameter-expansionextglob

How does negative matching work in extglob in parameter expansion


Problem

The behaviour of

!(pattern-list)

does not work the way I would expect when used in parameter expansion, specifically

${parameter/pattern/string}

Input

a="1 2 3 4 5 6 7 8 9 10"

Test cases

$ printf "%s\n" "${a/!([0-9])/}"
[blank]
#expected 12 3 4 5 6 7 8 9 10

$ printf "%s\n" "${a/!(2)/}"
[blank]
#expected  2 3 4 5 6 7 8 9 10

$ printf "%s\n" "${a/!(*2*)/}"
2 3 4 5 6 7 8 9 10
#Produces the behaviour expected in previous one, not sure why though

$ printf "%s\n" "${a/!(*2*)/,}"
,2 3 4 5 6 7 8 9 10
#Expected after previous worked

$ printf "%s\n" "${a//!(*2*)/}"
2
#Expected again previous worked

$ printf "%s\n" "${a//!(*2*)/,}"
,,2,
#Why are there 3 commas???

Specs

GNU bash, version 4.2.46(1)-release (x86_64-redhat-linux-gnu)

Notes

These are very basic examples, so if it is possible to include more complex examples with explanations in the answer then please do.

Any more info or examples needed let me know in the comments.

Have already looked at How does extglob work with shell parameter expansion?, and have even commented on what the problem is with that particular problem, so please don't mark as a dupe.


Solution

  • Parameter expansion of the form ${parameter/pattern/string} (where pattern doesn't start with a /) works by finding the leftmost longest substring in the value of the variable parameter that matches the pattern pattern and replacing it with string. In other words, $parameter is decomposed into three parts prefix,match, and suffix such that

    1. $parameter == "${prefix}${match}${suffix}"
    2. $prefix is the shortest possible string enabling the other requirements to be fulfilled (i.e. the match, if at all possible, occurs in the leftmost position)
    3. $match matches pattern and is as long as possible
    4. any of $prefix, $match and/or $suffix can be empty

    and the result of ${parameter/pattern/string} is "${prefix}string${suffix}".

    For the global replacement form (${parameter//pattern/string}) of this type of parameter expansion, the same process is recursively performed for the suffix part, however a zero-length match is handled as a special case (in order to prevent infinite recursion):

    • if "${prefix}${match}" != ""

      "${parameter//pattern/string}" = "${prefix}string${suffix//pattern/string}"
      

      else suffix=${parameter:1} and

      "${parameter//pattern/string}" = "string${parameter:0:1}${suffix}//pattern/string}"
      

    Now let's analyze the cases individually:

    • "${a/!([0-9])/}" --> prefix='' match='1 2 3 4 5 6 7 8 9 10' suffix=''. Indeed, '1 2 3 4 5 6 7 8 9 10' is not a string consisting of a single digit, and therefore it matches the pattern !([0-9]). Hence the empty result of expansion.

    • "${a/!(2)/}" --> prefix='' match='1 2 3 4 5 6 7 8 9 10' suffix=''. Similar to the above, '1 2 3 4 5 6 7 8 9 10' is not a string consisting of the single character '2', and therefore it matches the pattern !(2). Hence the empty result of expansion.

    • "${a/!(*2*)/}" --> prefix='' match='1 ' suffix='2 3 4 5 6 7 8 9 10'. The substring '1 ' doesn't match the pattern *2*, and therefore it matches the pattern !(*2*).

    • "${a/!(*2*)/,}". There were no surprises here, so no need to elaborate.

    • "${a//!(*2*)/}". There were no surprises here, so no need to elaborate.

    • "${a//!(*2*)/,}" --> prefix='' match='1 ' suffix='2 3 4 5 6 7 8 9 10'. Then ${suffix//!(*2*)/,} expands to ",2," as follows. The empty string in the beginning of suffix matches the pattern !(*2*), producing an extra comma in the result. Since the zero-length match special case (described above) was triggered, the first character of suffix is forcibly consumed, leaving us with ' 3 4 5 6 7 8 9 10', which matches the !(*2*) pattern in its entirety and is replaced with the last comma that we see in the final result of the expansion.