Search code examples
c#stringlinqgrouping

How to use linq to group a list of strings on only certain strings


example list of strings:

var test = new List<string>{
    "hdr1","abc","def","ghi","hdr2","lmn","opq","hdr3","rst","xyz"
};

I want to partition this list by "hdr*" so that each group contains elements...

"hdr1","abc","def","ghi"  
"hdr2","lmn","opq",  
"hdr3","rst","xyz"  

I tried:

var result = test.GroupBy(g => g.StartsWith("hdr"));

but this gives me two groups

"hdr1","hdr2","hdr3"  
"abc","def"..."xyz"  

What is the proper LINQ statement I should use? Let me emphasize that the strings following "hdr*" could be anything. The only thing they have in common is that they follow "hdr*".


Solution

  • You get two groups because one group is the group of elements starting with "hdr" and the other group is the group of elements not starting with "hdr". StartsWith returns a bool, so this results in two groups having the Keys false and true.

    You can use statement blocks in LINQ. This enables us to do:

    string header = null;
    var groups = test
        .Select(s => {
            if (s.StartsWith("hdr")) header = s;
            return s;
        })
        .Where(s => header != s)
        .GroupBy(s => header);
    

    We store the last header in header. The where clause eliminates the header itself, since the header is the group key.

    The following test...

    foreach (var g in groups) {
        Console.WriteLine(g.Key);
        foreach (var item in g) {
            Console.WriteLine("    " + item);
        }
    }
    

    ... prints this with the given list:

    hdr1
        abc
        def
        ghi
    hdr2
        lmn
        opq
    hdr3
        rst
        xyz
    

    Instead, we can also create lists with the header as first element:

    string header = null;
    IEnumerable<List<string>> lists = test
        .Select(s => {
            if (s.StartsWith("hdr")) {
                header = s;
            }
            return s;
        })
        .GroupBy(s => header)
        .Select(g => g.ToList());
    

    This test...

    foreach (var l in lists) {
        foreach (var item in l) {
            Console.Write(item + " ");
        }
        Console.WriteLine();
    }
    

    ... prints:

    hdr1 abc def ghi
    hdr2 lmn opq
    hdr3 rst xyz