Search code examples
c++re2

how to use RE2 library when match arguments are unknown


I am not able to use RE2::FullMatchN when the number of match arguments are determined at run time.

const RE2::Arg* args[10];
int n;
bool match = RE2::FullMatchN("abcd@abcd.com", "([^ @]+)@([^ @]+)", args, n);

At the end, I want to obtain 2 strings in above example - abcd and abcd.com


Solution

  • You can use RE2 as an object. If RE2 object has successfully parsed regex you can call NumberOfCapturingGroups() method. Knowing how many capturing groups there are you can dynamically allocate an array of pointers to RE2::Arg.

    Here an example function:

    I also suggest you to wrap regex in '(' - ')' since re2 does not return by default 0th argument for full match as many other APIs do.

    bool re2_full_match(const std::string & pattern, const std::string & str, std::vector<std::string> & results)
    {
        std::string wrapped_pattern = "(" + pattern + ")";
        RE2::Options opt;
        opt.set_log_errors(false);
        opt.set_case_sensitive(false);
        opt.set_utf8(false);
        RE2 re2(wrapped_pattern, opt);
        if (!re2.ok()) {
            /// Failed to compile regular expression.
            return false;
        }
    
        /// Argument vector.
        std::vector<RE2::Arg> arguments;
        /// Vercor of pointers to arguments.
        std::vector<RE2::Arg *> arguments_ptrs;
    
        /// Get number of arguments.
        std::size_t args_count = re2.NumberOfCapturingGroups();
    
        /// Adjust vectors sizes.
        arguments.resize(args_count);
        arguments_ptrs.resize(args_count);
        results.resize(args_count);
        /// Capture pointers to stack objects and result object in vector..
        for (std::size_t i = 0; i < args_count; ++i) {
            /// Bind argument to string from vector.
            arguments[i] = &results[i];
            /// Save pointer to argument.
            arguments_ptrs[i] = &arguments[i];
        }
    
        return RE2::FullMatchN(StringPiece(str), re2, arguments_ptrs.data(), args_count);
    }
    

    But in the spirit of regex I suggest you to use ^....$ instead of full_match, and rename full_match to find:

    bool re2_find(const std::string & pattern, const std::string & str, std::vector<std::string> & results)
    {
        std::string wrapped_pattern = "(" + pattern + ")";
        RE2::Options opt;
        opt.set_log_errors(false);
        opt.set_case_sensitive(false);
        opt.set_utf8(false);
        RE2 re2(wrapped_pattern, opt);
        if (!re2.ok()) {
            /// Failed to compile regular expression.
            return false;
        }
    
        /// Argument vector.
        std::vector<RE2::Arg> arguments;
        /// Vercor of pointers to arguments.
        std::vector<RE2::Arg *> arguments_ptrs;
    
        /// Get number of arguments.
        std::size_t args_count = re2.NumberOfCapturingGroups();
    
        /// Adjust vectors sizes.
        arguments.resize(args_count);
        arguments_ptrs.resize(args_count);
        results.resize(args_count);
        /// Capture pointers to stack objects and result object in vector..
        for (std::size_t i = 0; i < args_count; ++i) {
            /// Bind argument to string from vector.
            arguments[i] = &results[i];
            /// Save pointer to argument.
            arguments_ptrs[i] = &arguments[i];
        }
    
        StringPiece piece(str);
        return RE2::FindAndConsumeN(&piece, re2, arguments_ptrs.data(), args_count);
    }