Search code examples
c++regexre2

How do I find the offset of a matching string using RE2?


RE2 is a modern regular expression engine available from Google. I want to use RE2 in a program that is currently using gnuregex. The problem I have relates to finding out what matched. What RE2 returns is the string that matched. I need to know the offset of what matched. My current plan is to take what RE2 returns and then use a find on the C++ string. But this seems wasteful. I've gone through the RE2 manual and can't figure out how to do it. Any ideas?


Solution

  • Store the result in a re2::StringPiece instead of a std::string. The value of .data() will point into the original string.

    Consider this program. In each of the tests, result.data() is a pointer into the original const char* or std::string.

    #include <re2/re2.h>
    #include <iostream>
    
    
    int main(void) {
    
      { // Try it once with character pointers
        const char *text[] = { "Once", "in", "Persia", "reigned", "a", "king" };
    
        for(int i = 0; i < 6; i++) {
          re2::StringPiece result;
          if(RE2::PartialMatch(text[i], "([aeiou])", &result))
            std::cout << "First lower-case vowel at " << result.data() - text[i] << "\n";
          else
            std::cout << "No lower-case vowel\n";
        }
      }
    
      { // Try it once with std::string
        std::string text[] = { "While", "I", "pondered,", "weak", "and", "weary" };
    
        for(int i = 0; i < 6; i++) {
          re2::StringPiece result;
          if(RE2::PartialMatch(text[i], "([aeiou])", &result))
            std::cout << "First lower-case vowel at " << result.data() - text[i].data() << "\n";
          else
            std::cout << "No lower-case vowel\n";
        }
      }
    }