Search code examples
c++regexc++17regex-group

Composing complex regular expressions with "DEFINED" subexpressions in C++


I'm trying to write a regular expression in C++ to match a base64 encoded string. I'm quite familiar with writing complex regular expressions in Perl so I started with that:

use strict;
use warnings;

my $base64_regex = qr{
(?(DEFINE)
    (?<B64>[A-Za-z0-9+/])
    (?<B16>[AEIMQUYcgkosw048])
    (?<B04>[AQgw])
)
^(
    ((?&B64){4})*
    (
        (?&B64){4}|
        (?&B64){2}(?&B16)=|
        (?&B64)(?&B04)={2}
    )
)?$}x;

# "Hello World!" base64 encoded 
my $base64 = "SGVsbG8gV29ybGQh";

if ($base64 =~ $base64_regex)
{
    print "Match!\n";
}
else
{
    print "No match!\n"
}

Output:

Match!

I then tried to implement a similar regular expression in C++:

#include <iostream>
#include <regex>

int main()
{
    std::regex base64_regex(
        "(?(DEFINE)"
            "(?<B64>[A-Za-z0-9+/])"
            "(?<B16>[AEIMQUYcgkosw048])"
            "(?<B04>[AQgw])"
        ")"
        "^("
            "((?&B64){4})*"
            "("
                "(?&B64){4}|"
                "(?&B64){2}(?&B16)=|"
                "(?&B64)(?&B04)={2}"
            ")"
        ")?$");

    // "Hello World!" base64 encoded 
    std::string base64 = "SGVsbG8gV29ybGQh";

    if (std::regex_match(base64, base64_regex))
    {
        std::cout << "Match!" << std::endl;
    }
    else
    {
        std::cout << "No Match!" << std::endl;
    }
}

but when I run the code I get an exception telling me it is not a valid regular expression.

enter image description here

Catching the exception and printing the "what" string doesn't help much either. All it gives me is the following:

regex_error(error_syntax)

Obviously I could get rid of the "DEFINE" block with my pre-defined subpatterns, but that would make the whole expression very difficult to read... and, well... I like to be able to maintain my own code when I come back to it a few years later lol so that isn't really a good option.

How can I get a similar regular expression to work in C++?

Note: This must all be done within a single "std::regex" object because I am writing a library where users will be able to pass a string to be able to define their own regular expressions and I want these users to be able to "DEFINE" similar subexpressions within their regex if they need to.


Solution

  • I took a suggestion from the comments and checked out "boost" regex since it supports "Perl" regular expressions. I gave it a try and it worked great!

    #include <boost/regex.hpp>
    
    boost::regex base64_regex(
        "(?(DEFINE)"
            "(?<B64>[A-Za-z0-9+/])"
            "(?<B16>[AEIMQUYcgkosw048])"
            "(?<B04>[AQgw])"
        ")"
        "("
            "((?&B64){4})*"
            "("
                "(?&B64){4}|"
                "(?&B64){2}(?&B16)=|"
                "(?&B64)(?&B04)={2}"
            ")"
        ")?", boost::regex::perl);