Search code examples
c#linqienumerableset-theory

Finding symmetric difference with LINQ


I have two collections a and b. I would like to compute the set of items in either a or b, but not in both (a logical exclusive or). With LINQ, I can come up with this:

IEnumerable<T> Delta<T>(IEnumerable<T> a, IEnumerable<T> b)
{
    return a.Except (b).Union (b.Except (a));
}

I wonder if there are other more efficient or more compact ways of producing the difference between the two collections.

Edit 1: Jon Skeet posted a first solution which does not preserve the order of the items by relying on a HashSet. I wonder if there are other approaches which would preserve the order of a and b in the output.


Solution

  • Use HashSet<T> directly - it has a SymmetricExceptWith method:

    HashSet<T> data = new HashSet<T>(a);
    data.SymmetricExceptWith(b);
    

    EDIT: If you want to maintain the order, here's an alternative:

    HashSet<T> data = new HashSet<T>(a);
    data.IntersectWith(b);
    foreach (T t in a.Concat(b))
    {
        if (!data.Contains(t))
        {
            yield return t;
        }
    }
    

    This has the following important differences:

    • Both a and b are iterated over twice. In some cases that could be a very bad thing - you could call ToList on each of them to start with to retain a buffer.
    • If there are duplicates in either a or b, they will be yielded multiple times. If you wanted to avoid this you could keep a set of already-yielded values. At this point, it would be equivalent to:

      a.Concat(b).Except(a.Intersect(b))
      

    That's still only two set operations instead of the three in your original code though.