Search code examples
c#regexescaping

Regex split preserving strings and escape character


I need to split a string on C#, based on space as delimiter and preserving the quotes.. this part is ok. But additionally, I want to allow escape character for string \" to allow include other quotes inside the quotes.

Example of what I need:

One Two "Three Four" "Five \"Six\""

To:

  • One
  • Two
  • Three Four
  • Five "Six"

This is the regex I am currently using, it is working for all the cases except "Five \"Six\""

//Split on spaces unless in quotes
        List<string> matches = Regex.Matches(input, @"[\""].+?[\""]|[^ ]+")
            .Cast<Match>()
            .Select(x => x.Value.Trim('"'))
            .ToList();

I'm looking for any Regex, that would do the trick.


Solution

  • You can use

    var input = "One Two \"Three Four\" \"Five \\\"Six\\\"\"";
    // Console.WriteLine(input); // => One Two "Three Four" "Five \"Six\""
    List<string> matches = Regex.Matches(input, @"(?s)""(?<r>[^""\\]*(?:\\.[^""\\]*)*)""|(?<r>\S+)")
                .Cast<Match>()
                .Select(x => Regex.Replace(x.Groups["r"].Value, @"\\(.)", "$1"))
                .ToList();
    foreach (var s in matches)
        Console.WriteLine(s);
    

    See the C# demo.

    The result is

    One
    Two
    Three Four
    Five "Six"
    

    The (?s)"(?<r>[^"\\]*(?:\\.[^"\\]*)*)"|(?<r>\S+) regex matches

    • (?s) - a RegexOptions.Singleline equivalent to make . match newlines, too
    • "(?<r>[^"\\]*(?:\\.[^"\\]*)*)" - ", then Group "r" capturing any zero or more chars other than " and \ and then zero or more sequences of any escaped char and zero or more chars other than " and \, and then a " is matched
    • | - or
    • (?<r>\S+) - Group "r": one or more whitespaces.

    The .Select(x => Regex.Replace(x.Groups["r"].Value, @"\\(.)", "$1")) takes the Group "r" value and unescapes (deletes a \ before) all escaped chars.