Search code examples
c#linq

Extract values from a string into arrays


I have a string like this:

john "is my best buddy" and he loves "strawberry juice"

I want to-

  • Extract texts within double-quotes into a string array array1
  • Split texts outside of double-quotes by spaces and then insert them into another string array (array2).

Output:

array1[0]: is my best buddy

array1[1]: strawberry juice

array2[0]: john

array2[1]: and

array2[2]: he

array2[3]: loves

Any help is appreciated.


Solution

  • Clearly, this is a call for Regular Expressions:

    var str = @"john ""is my best buddy"" and he loves ""strawberry juice""";
    
    var regex = new Regex("(\"(?'quoted'[^\"]+)\")|(?'word'\\w+)",
                       RegexOptions.Singleline|RegexOptions.Compiled);
    
    var matches = regex.Matches(str);
    
    var quotes = matches.Cast<Match>()
                        .SelectMany(m => m.Groups.Cast<Group>())
                        .Where(g => g.Name == "quoted" && g.Success)
                        .Select(g => g.Value)
                        .ToArray();
    
    var words = matches.Cast<Match>()
                        .SelectMany(m => m.Groups.Cast<Group>())
                        .Where(g => g.Name == "word" && g.Success)
                        .Select(g => g.Value)
                        .ToArray();