Search code examples
linqenumerable

How to remove duplicates in the middle


Given a sequence like below:-

var list = new[] {"1a", "1b", "1c", "1d", "2a", "3a", "4a", "4b", "5a", "6a", "7a", "7b", "8a"}.Select(x => new { P1 = x.Substring(0,1), P2 = x.Substring(1,1)});

I'd like to remove the duplicates in the "middle" to end up with:-

var expected = new[] {"1a", "1d", "2a", "3a", "4a", "4b", "5a", "6a", "7a", "7b", "8a"}.Select(x => new { P1 = x.Substring(0, 1), P2 = x.Substring(1, 1) });

So any repeats of more than two are stripped out. It's important that I get the first and last duplicate though.


Solution

  • For those that don't Aggregate and want a super short answer using closure here:

    var data = new[] { "1a", "1b", "1c", "1d", "2a", "3a", "4a", "4b", "1e", "5a", "6a", "7a", "7b", "8a" };
    char priorKey = ' ';
    int currentIndex = 0;
    
    var result2 = data.GroupBy((x) => x[0] == priorKey ? new { k = x[0], g = currentIndex } : new { k = priorKey = x[0], g = ++currentIndex })
        .Select(i => new[] { i.First(), i.Last() }.Distinct())
        .SelectMany(i => i).ToArray();
    

    Hat Tip to @Slai for the code this is based on (I added a fix for the non-continuous group issue.)


    Here is how to do it with Aggregate. I didn't test all edge cases... just your test cases.

    var list = new[] { "1a", "1b", "1c", "1d", "2a", "3a", "4a", "4b", "5a", "6a", "7a", "7b", "8a" }
               .Aggregate(new { result = new List<string>(), first = "", last = "" },
                  (store, given) =>
                  {
                    var result = store.result;
                    var first = store.first;
                    var last = store.last;
    
                     if (first == "")
                      // this is the first one.
                      first = given;
                    else
                    {
                      if (first[0] == given[0])
                        last = given;
                      else
                      {
                        result.Add(first);
                        if (last != "")
                          result.Add(last);
                        first = given;
                        last = "";
                      }
    
                    }
                     return new { result = result, first = first, last = last }; },
                     (store) => { store.result.Add(store.first); if (store.last != "") store.result.Add(store.last); return store.result; })
               .Select(x => new { P1 = x.Substring(0,1), P2 = x.Substring(1,1)});
    

    I create an object to hold the list so far and the first and last known so far.

    Then I just apply logic to remove the middle stuff.