Search code examples
c++regexxsdc++17character-class

Regex character class subtraction in C++


I'm writing a C++ program that will need to take regular expressions that are defined in a XML Schema file and use them to validate XML data. The problem is, the flavor of regular expressions used by XML Schemas does not seem to be directly supported in C++.

For example, there are a couple special character classes \i and \c that are not defined by default and also the XML Schema regex language supports something called "character class subtraction" that does not seem to be supported in C++.

Allowing the use of the \i and \c special character classes is pretty simple, I can just look for "\i" or "\c" in the regular expression and replace them with their expanded versions, but getting character class subtraction to work is a much more daunting problem...

For example, this regular expression that is valid in an XML Schema definition throws an exception in C++ saying it has unbalanced square brackets.

#include <iostream>
#include <regex>

int main()
{
    try
    {
        // Match any lowercase letter that is not a vowel
        std::regex rx("[a-z-[aeiuo]]");
    }
    catch (const std::regex_error& ex)
    {
        std::cout << ex.what() << std::endl;
    }
}

How can I get C++ to recognize character class subtraction within a regex? Or even better, is there a way to just use the XML Schema flavor of regular expressions directly within C++?


Solution

  • Okay after going through the other answers I tried out a few different things and ended up using the xmlRegexp functionality from libxml2.

    The xmlRegexp related functions are very poorly documented so I figured I would post an example here because others may find it useful:

    #include <iostream>
    #include <libxml/xmlregexp.h>
    
    int main()
    {
        LIBXML_TEST_VERSION;
    
        xmlChar* str = xmlCharStrdup("bcdfg");
        xmlChar* pattern = xmlCharStrdup("[a-z-[aeiou]]+");
        xmlRegexp* regex = xmlRegexpCompile(pattern);
    
        if (xmlRegexpExec(regex, str) == 1)
        {
            std::cout << "Match!" << std::endl;
        }
    
        free(regex);
        free(pattern);
        free(str);
    }
    

    Output:

    Match!

    I also attempted to use the XMLString::patternMatch from the Xerces-C++ library but it didn't seem to use an XML Schema compliant regex engine underneath. (Honestly I have no clue what regex engine it uses underneath and the documentation for that was pretty abysmal and I couldn't find any examples online so I just gave up on it.)