Search code examples
c++parsingenumsregex-groupboost-regex

Parsing *.cpp file containing enum using boost::regex.


I alredy parsed file and split content to enum or enum classes.

std::string sourceString = readFromFile(typesHDestination);
boost::smatch xResults;
std::string::const_iterator Start = sourceString.cbegin();
std::string::const_iterator End = sourceString.cend();

while (boost::regex_search(Start, End, xResults, boost::regex("(?<data_type>enum|enum\\s+class)\\s+(?<enum_name>\\w+)\\s*\{(?<content>[^\}]+?)\\s*\}\\s*")))
{
    std::cout << xResults["data_type"]
        << " " << xResults["enum_name"] << "\n{\n";

    std::string::const_iterator ContentStart = xResults["content"].begin();
    std::string::const_iterator ContentEnd = xResults["content"].end();
    boost::smatch xResultsInner;

    while (boost::regex_search(ContentStart, ContentEnd, xResultsInner, boost::regex("(?<name>\\w+)(?:(?:\\s*=\\s*(?<value>[^\,\\s]+)(?:(?:,)|(?:\\s*)))|(?:(?:\\s*)|(?:,)))")))
    {
        std::cout << xResultsInner["name"] << ": " << xResultsInner["value"] << std::endl;

        ContentStart = xResultsInner[0].second;
    }

    Start = xResults[0].second;
    std::cout << "}\n";
}

Its ok if enums are without comments. Output

I tried to add named group <comment> to save comments in enums, but failed every time. (\/{2}\s*.+) - sample for comments with double slashes.

I tested using this online regex and with boost::regex.

  1. The first step - from *.cpp file to <data_type> <enum_name> <content> regex:

(?'data_type'enum|enum\s+class)\s+(?'enum_name'\w+)\s*{\s*(?'content'[^}]+?)\s*}\s*

  1. From <content> to <name> <value> <comment> regex:

(?'name'\w+)(?:(?:\s*=\s*(?'value'[^\,\s/]+)(?:(?:,)|(?:\s*)))|(?:(?:\s*)|(?:,)))

The last one contains error. Is there any way to fix it and add feature to store coments in group?


Solution

  • I argree with the fact that using regex to parse complicated data is not the best solution. I'v made an omission of the few major conditions. First of all, i parsed some kind of generated source code containing emuns and enum classes. So there were no suprises in code, and code was regular. So i parsing regular code with regex.

    The Answer: (the first step is the same, the second was fixed) How to parse enums/emun classes with regex:

    1. The first step - from *.cpp file to <data_type> <enum_name> <content> regex:

    (?'data_type'enum|enum\s+class)\s+(?'enum_name'\w+)\s*{\s*(?'content'[^}]+?)\s*}\s*

    1. From <content> to <name> <value> <comment> regex:

    ^\s*(?'name'\w+)(?:(?:\s*=\s*(?'value'[^,\n/]+))|(?:[^,\s/]))(?:(?:\s$)|(?:\s*,\s*$)|(?:[^/]/{2}\s(?'comment'.*$)))

    All test were ok and here is marked text by colors.enter image description here