Search code examples
c++regexc++11verbose

std::regex ignore whitespace inside regex command


Can we format a std::regex string with whitespace/linebreak which get ignored - just for better reading? Is there any option available like in Python VERBOSE)?

Without verbose:

charref = re.compile("&#(0[0-7]+"
                     "|[0-9]+"
                     "|x[0-9a-fA-F]+);")

With verbose:

charref = re.compile(r"""
 &[#]                # Start of a numeric entity reference
 (
     0[0-7]+         # Octal form
   | [0-9]+          # Decimal form
   | x[0-9a-fA-F]+   # Hexadecimal form
 )
 ;                   # Trailing semicolon
""", re.VERBOSE)

Solution

  • inline std::string remove_ws(std::string in) {
      in.erase(std::remove_if(in.begin(), in.end(), std::isspace), in.end());
      return in;
    }
    
    inline std::string operator""_nows(const char* str, std::size_t length) {
      return remove_ws({str, str+length});
    }
    

    now, this doesn't support # comments, but adding that should be easy. Simply create a function that strips them from a string, and do this:

    std::string remove_comments(std::string const& s)
    {
      std::regex comment_re("#[^\n]*\n");
      return std::regex_replace(s, comment_re, "");
    }
    // above remove_comments not tested, but you get the idea
    
    std::string operator""_verbose(const char* str, std::size_t length) {
      return remove_ws( remove_comments( {str, str+length} ) );
    }
    

    Once finished, we get:

    charref = re.compile(R"---(
     &[#]                # Start of a numeric entity reference
     (
         0[0-7]+         # Octal form
       | [0-9]+          # Decimal form
       | x[0-9a-fA-F]+   # Hexadecimal form
     )
     ;                   # Trailing semicolon
    )---"_verbose);
    

    and done.