Search code examples
regexboost-regex

How to match multiple tokens using regular expressions?


I need using regular expressions to match special key and values. There is a special condition that I do not know how to do.

The string likes abcd/abcd. I need match all single words before /. So I write (.)*/, and then I found it only match 1 token (d). What's more, even it matches all I need, I still do not know how many tokens matched.

So what should the correct regular expressions be? The real condition is much more complex than the example, so if it can be achieved by regular expressions, I do not want write a tokenizer.


Solution

  • The Boost library that you are using provides a way to capture repeated groups into a stack provided you compiled the library with BOOST_REGEX_MATCH_EXTRA flag set, otherwise what won't have a member named captures. When you use boost::regex_search or boost::regex_match, pass the boost::match_extra flag, and you will capture all vlaues with your (.)* (matching and capturing any character but a newline, zero or more occurrences) into a stack that is acessible via the captures member of the sub_match object.

    Here is a demo method from the official Boost site:

    #include <boost/regex.hpp>
    #include <iostream>
    
    
    void print_captures(const std::string& regx, const std::string& text)
    {
       boost::regex e(regx);
       boost::smatch what;
       std::cout << "Expression:  \"" << regx << "\"\n";
       std::cout << "Text:        \"" << text << "\"\n";
       if(boost::regex_match(text, what, e, boost::match_extra))
       {
          unsigned i, j;
          std::cout << "** Match found **\n   Sub-Expressions:\n";
          for(i = 0; i < what.size(); ++i)
             std::cout << "      $" << i << " = \"" << what[i] << "\"\n";
          std::cout << "   Captures:\n";
          for(i = 0; i < what.size(); ++i)
          {
             std::cout << "      $" << i << " = {";
             for(j = 0; j < what.captures(i).size(); ++j)
             {
                if(j)
                   std::cout << ", ";
                else
                   std::cout << " ";
                std::cout << "\"" << what.captures(i)[j] << "\"";
             }
             std::cout << " }\n";
          }
       }
       else
       {
          std::cout << "** No Match found **\n";
       }
    }
    
    int main(int , char* [])
    {
       print_captures("(.*)bar|(.*)bah", "abcbar");
       return 0;
    }