Search code examples
c#duplicatesdistincthashsetiequalitycomparer

Removing duplicate byte[]s from a collection


This will probably be an extremely simple question. I'm simply trying to remove duplicate byte[]s from a collection.

Since the default behaviour is to compare references, I tought that creating an IEqualityComparer would work, but it doesn't.

I've tried using a HashSet and LINQ's Distinct().

Sample code:

using System;
using System.Collections.Generic;
using System.Linq;

namespace cstest
{
    class Program
    {
        static void Main(string[] args)
        {
            var l = new List<byte[]>();
            l.Add(new byte[] { 5, 6, 7 });
            l.Add(new byte[] { 5, 6, 7 });
            Console.WriteLine(l.Distinct(new ByteArrayEqualityComparer()).Count());
            Console.ReadKey();
        }
    }

    class ByteArrayEqualityComparer : IEqualityComparer<byte[]>
    {
        public bool Equals(byte[] x, byte[] y)
        {
            return x.SequenceEqual(y);
        }

        public int GetHashCode(byte[] obj)
        {
            return obj.GetHashCode();
        }
    }
}

Output:

2

Solution

  • The GetHashCode will be used by Distinct, and won't work "as is"; try something like:

    int result = 13 * obj.Length;
    for(int i = 0 ; i < obj.Length ; i++) {
        result = (17 * result) + obj[i];
    }
    return result;
    

    which should provide the necessary equality conditions for hash-codes.

    Personally, I would also unroll the equality test for performance:

    if(ReferenceEquals(x,y)) return true;
    if(x == null || y == null) return false;
    if(x.Length != y.Length) return false;
    for(int i = 0 ; i < x.Length; i++) {
        if(x[i] != y[i]) return false;
    }
    return true;