Search code examples
c#performancesethashset

Is a ISet<T> sort of "guaranteed" to have O(1) lookup speed?


Sometimes I have a method that I want to accept a broad IEnumerable<T> as input, but I'm doing an operation that might have O(2^n) impact if it's not a HashSet<T>.

Even worse when it's a lazy IEnumerable.

Example:

public static IEnumerable<Booking> ByIds(this IEnumerable<Booking> bookings, IEnumerable<int> ids)
{
    return bookings.Where(b => ids.Contains(b.Id));
}
  • The caller could call ToHashSet() before calling this method, but I don't want the caller to think about this. For me it's an implementation detail.

  • I could also force the input parameter to be a HashSet. But once again - I think this is an implementation detail the caller should not think about.

  • I could also always call ToHashSet() in ByIds to be sure, but with extreme large sets this feels like unnecessary performance waste.

So I am thinking about something like the following:

public static IEnumerable<Booking> ByIds(this IEnumerable<Booking> bookings, IEnumerable<int> ids)
{
    ids = ids.EnsureHashSet();

    return bookings.Where(b => ids.Contains(b.Id));
}

public static IEnumerable<T> EnsureHashSet<T>(this IEnumerable<T> source)
{
    if ( source is ISet<T> || source is IReadOnlySet<T> )
        return source;

    return source.ToHashSet();
}

My question is:

Is there some "intention" in the contract of a ISet<T> and IReadOnlySet<T> that it should be O(1) lookup performance?

I understand you can't prevent people doing weird stuff in an implementation, but if everyone is behaving properly is there at least the intention of it being a O(1) lookup?


Solution

  • No, there is no guarantee that a collection that implements the ISet<T> interface will have an O(1) Contains implementation. If you want this guarantee, you should restrict the matching to types with documented O(1) behavior:

    if (source is HashSet<T>)
    

    As a side-note, the IEnumerable<T> interface carries strong deferred execution semantics, and you shouldn't enumerate it more than once. There is no guarantee that enumerating an IEnumerable<T> multiple times will always yield the same items. Also each enumeration can be quite costly, because it might be hitting a database or a remote server. So don't do this if the ids is IEnumerable<int>:

    return bookings.Where(b => ids.Contains(b.Id)); // Code smell