Search code examples
c++regexc++11gccgnu

C++ standard regex difference between std=c++11 and std=gnu++11


There seems to be a difference in regex behavior when comiling code using regex and gnu extensions.

The following code produces an exception when compiliing with -std=c++11, however -std=gnu++11 works:

#include <regex>
#include <iostream>

int main(int argc, char **argv) {

    std::string rex { "\\[1\\]" };
    std::string str { "[1]" };
    std::regex regex(rex, std::regex::extended);
    auto match = std::regex_match(str.begin(), str.end(), regex);
    std::cout << "Result is " << match << std::endl;
    return 0;
}

I tried gcc from 4.9.4 up to 9.2 with same behavior. Any ideas why this code behaves differently?


Solution

  • std::regex::extended uses extended POSIX regular expressions. According to those syntax rules, a backslash can only precede a "special character", which is one of .[\()*+?{|^$. While a left bracket [ is a special character, the right bracket ] is not. So your regular expression should be "\\[1]" instead of "\\[1\\]" to be standard-compliant.

    Looking at the standard library source code, there is the following in regex_scanner.tcc:

    #ifdef __STRICT_ANSI__
          // POSIX says it is undefined to escape ordinary characters
          __throw_regex_error(regex_constants::error_escape,
                      "Unexpected escape character.");
    #else
          _M_token = _S_token_ord_char;
          _M_value.assign(1, __c);
    #endif
    

    Which shows that it is a GNU extension to allow escaping non-special characters. I don't know where this extension is documented.