Search code examples
c++string-parsing

Simple (mostly) variable parser


In one of my projects, I need to be able to provide a very simple variable find-and-replace parser (mostly for use in paths). Variables are used primarily during startup and occasionally to access files (not the program's primary function, just loading resources), so the parser need not be high-performance. I would greatly prefer it to be thread-safe, however.

The parser needs to be able to store a set of variables (map<string, string> at the moment) and be able to replace tokens with the corresponding value in strings. Variable values may contain other variables, which will be resolved when the variable is used (not when it is added, as variables may be added over time).

The current variable grammar looks something like:

$basepath$/resources/file.txt
/$drive$/$folder$/path/file

My current parser uses a pair of stringstreams ("output" and "varname"), writes to the "output" stream until it finds the first $, the "varname" stream until the second $, then looks up the variable (using the contents of varname.str()). It's very simple and works nicely, even when recursing over variable values.

String Parse(String input)
{
    stringstream output, varname;
    bool dest = false;
    size_t total = input.length();
    size_t pos = 0;
    while ( pos < total )
    {
        char inchar = input[pos];
        if ( inchar != '$' )
        {
            if ( dest ) output << inchar;
            else varname << inchar;
        } else {
            // Is a varname start/end
            if ( !dest )
            {
                varname.clear();
                dest = true;
            } else {
                // Is an end
                Variable = mVariables.find(varname.str());
                output << Parse(Variable.value());
                dest = false;
            }
        }

        ++pos;
    }

    return output.str();
}

(error checking and such removed)

However, that method fails me when I try to apply it to my desired grammar. I would like something similar to what Visual Studio uses for project variables:

$(basepath)/resources/file.txt
/$(drive)/$(folder)/path/file

I would also like to be able to do:

$(base$(path))/subdir/file

Recursing in the variable name has run me into a wall, and I'm not sure the best way to proceed.

I have, at the moment, two possible concepts:

Iterate over the input string until I find a $, look for a ( as the next character, then find the matching ) (counting levels in and out until the proper close paran is reached). Send that bit off to be parsed, then use the returned value as the variable name. This seems like it will be messy and cause a lot of copying, however.

The second concept is to use a char *, or perhaps char * &, and move that forward until I reach a terminating null. The parser function can use the pointer in recursive calls to itself while parsing variable names. I'm not sure how best to implement this technique, besides having each call keep track of the name it's parsed out, and append the returned value of any calls it makes.

The project need only compile in VS2010, so STL streams and strings, the supported bits of C++0x, and Microsoft-specific features are all fair game (a generic solution is preferable in case those reqs change, but it's not necessary at this point). Using other libraries is no good, though, especially not Boost.

Both my ideas seem like they're more complicated and messier than is needed, so I'm looking for a nice clean way of handling this. Code, ideas or documents discussing how best to do it are all very much welcome.


Solution

  • Simple solution is to search for the first ')' in the string, then move backwards to see if there's an identifier preceeded by "$(". If so, replace it and restart your scanning. If you don't find "$(" identifier, then find the next ')' - when there isn't one you're finished.

    To explain: by searching for a ) you can be sure that you're finding a complete identifier for your substitution, which then has the chance to contribute to some other identifier used in a subsequent substitution.

    EXAMPLE

    Had a great time on $($(day)$(month)), did you?
    
    Dictionary: "day" -> "1", "month" -> "April", "1April" -> "April Fools Day"
    
    Had a great time on $($(day)$(month)), did you?
                               ^ find this
    Had a great time on $($(day)$(month)), did you?
                          ^^^^^^ back up to match this complete substitution
    Had a great time on $(1$(month)), did you?
                          ^ substitution made, restart entire process...
    Had a great time on $(1$(month)), did you?
                                  ^ find this
    etc.