Search code examples
bashshellparameter-expansion

${foo//(/\\(} not working with extglobs enabled


I am trying to escape parentheses using parameter expansion. Although if I have extglob enabled, the following code doesn't work:

#!/usr/bin/env bash

shopt -s extglob

foo='file(2)'
foo=${foo//(/\\(}
foo=${foo//)/\\)}

printf '%s\n' "$foo"

# Expected:  file\(2\)
# Actual:    file(2\)

It correctly outputs file\(2\) when I disable extglob or explicitly escape the left parenthesis like this:

foo=${foo//\(/\\(}  

Why does extglob cause that? I don't see any extglob pattern there. Also, the right parenthesis works fine without a backslash.

Tested online at tutorialspoint.com and also locally using:

GNU bash, version 4.3.30(1)-release (x86_64-unknown-linux-gnu)
GNU bash, version 4.4.18(1)-release (x86_64-unknown-linux-gnu)
GNU bash, version 5.0.0(2)-alpha (x86_64-pc-linux-gnu)

Solution

  • This is a bug due to an optimization in bash.

    When replacing a pattern, bash first checks whether the pattern matches anywhere in the string. If it doesn't, then there's no point in doing any search&replace. The way it does it is by construct a new pattern by surrounding it with *..* as necessary:

      /* If the pattern doesn't match anywhere in the string, go ahead and
         short-circuit right away.  A minor optimization, saves a bunch of
         unnecessary calls to strmatch (up to N calls for a string of N
         characters) if the match is unsuccessful.  To preserve the semantics
         of the substring matches below, we make sure that the pattern has
         `*' as first and last character, making a new pattern if necessary. */
      /* XXX - check this later if I ever implement `**' with special meaning,
         since this will potentially result in `**' at the beginning or end */
      len = STRLEN (pat);
      if (pat[0] != '*' || (pat[0] == '*' && pat[1] == LPAREN && extended_glob) || pat[len - 1] != '*')
        {
          int unescaped_backslash;
          char *pp;
    
          p = npat = (char *)xmalloc (len + 3);
          p1 = pat;
          if (*p1 != '*' || (*p1 == '*' && p1[1] == LPAREN && extended_glob))
        *p++ = '*';
    

    The pattern it tries to match against the string ends up being *(*

    The opening *( is now unintentionally recognized as the start of an extglob, but when bash fails to find the closing ), it matches the pattern as a string instead:

     prest = PATSCAN (p + (*p == L('(')), pe, 0); /* ) */
      if (prest == 0)
        /* If PREST is 0, we failed to scan a valid pattern.  In this
           case, we just want to compare the two as strings. */
        return (STRCOMPARE (p - 1, pe, s, se));
    

    This means that unless the string to do replacements in is literally *(*, the optimization invalidly rejects the string thinking there's nothing to do. Of course, this also means that it works correctly for *(* itself:

    $ f='*(*'; echo "${f//(/\\(}"
    *\(*
    

    If you were to fudge this optimization check in the source code:

    diff --git a/subst.c b/subst.c
    index fc00cab0..f063f784 100644
    --- a/subst.c
    +++ b/subst.c
    @@ -4517,8 +4517,6 @@ match_upattern (string, pat, mtype, sp, ep)
       c = strmatch (npat, string, FNMATCH_EXTFLAG | FNMATCH_IGNCASE);
       if (npat != pat)
         free (npat);
    -  if (c == FNM_NOMATCH)
    -    return (0);
    
       len = STRLEN (string);
       end = string + len;
    

    then it would work correctly in your case:

    $ ./bash -c 'f="my string(1) with (parens)"; echo "${f//(/\\(}"'
    my string\(1) with \(parens)