Search code examples
c++c++11std

Is there a way to extract the search from a c++ regex? (not a question about regexs but #include <regex>)


given a C++ std regex is there a way to figure out what that regex would search for? get the string out that you put in when constructing it?

I've seen https://en.cppreference.com/w/cpp/regex/basic_regex and that doesn't list anything helpfull here. The only thing I can thing of is generating every possible string and seeing what matches but that seems like an insane solution to this. sizeof(regex) is a constant 32 no matter how long I make the search text so some abusive memory manipulation is out of the question. I've tried casting to strings and char *s thinking maybe most of the other details about it would be known at compile time and look to be handled in the type. This feels like something that should be doable. Large codebase that I do not own so doing something like wrapping the regex in a class that can be implicitly converted to a regex but that also separately stores the search as a string is out of the question.


Solution

  • No.*

    There seems to be a fundamental misunderstanding of what a regular expression is here. A regex is a directed graph that represents a pattern. When you apply it to a string, what you are doing is nothing more testing whether or not the string matches the pattern by successfully traversing the graph.

    Humans like convenience. It would be really painful to have to construct the graph by hand, or to have to use a specialized graph-drawing program to generate the code that constructs it.

    So instead we use a very convenient text string that represents the graph, which we tend to refer to as a “regular expression”, even though it is just a textual representation of a regular expression graph. Regex construction takes that string and builds the actual DFA (or NFA) graph. Thereafter we use that graph to match against any strings we wish.

    * So, yes, we could get the textual regex string out of the graph if we were to write the code to decompose the constructed graph back into a character string, but no one ever needs to do that! Why would we?

    ** The question, as asked, is unclear. It appears to me that the OP is asking how to get the string used to construct the regex back out of a compiled regex.