Search code examples
c++regexboostboost-regexboost-xpressive

Tokenize a string excluding delimiters inside quotes


First let me say, I have gone thoroughly through all other solutions to this problem on SO, and although they are very similar, none fully solve my problem.

I need a to extract all tokens excluding quotes (for the quoted ones) using boost regex.

The regex I think I need to use is:

sregex pattern = sregex::compile("\"(?P<token>[^\"]*)\"|(?P<token>\\S+)");

But I get an error of:

named mark already exists

The solution posted for C# seems to work with a duplicate named mark given that it is an OR expression with the other one.

Regular Expression to split on spaces unless in quotes


Solution

  • I answered a very similar question here:

    How to make my split work only on one real line and be capable to skip quoted parts of string?

    The example code

    • uses Boost Spirit
    • supports quoted strings, partially quoted fields, user defined delimiters, escaped quotes
    • supports many (diverse) output containers generically
    • supports models of the Range concept as input (includes char[], e.g.)

    Tested with a relatively wide range of compiler versions and Boost versions.

    https://gist.github.com/bcfbe2b5f071c7d153a0