Sometimes I have a method that I want to accept a broad IEnumerable<T>
as input, but I'm doing an operation that might have O(2^n) impact if it's not a HashSet<T>
.
Even worse when it's a lazy IEnumerable
.
Example:
public static IEnumerable<Booking> ByIds(this IEnumerable<Booking> bookings, IEnumerable<int> ids)
{
return bookings.Where(b => ids.Contains(b.Id));
}
The caller could call ToHashSet()
before calling this
method, but I don't want the caller to think about this. For me it's
an implementation detail.
I could also force the input parameter to be a HashSet
. But once
again - I think this is an implementation detail the caller should
not think about.
I could also always call ToHashSet()
in ByIds
to be sure, but with extreme large sets this feels like unnecessary performance waste.
So I am thinking about something like the following:
public static IEnumerable<Booking> ByIds(this IEnumerable<Booking> bookings, IEnumerable<int> ids)
{
ids = ids.EnsureHashSet();
return bookings.Where(b => ids.Contains(b.Id));
}
public static IEnumerable<T> EnsureHashSet<T>(this IEnumerable<T> source)
{
if ( source is ISet<T> || source is IReadOnlySet<T> )
return source;
return source.ToHashSet();
}
My question is:
Is there some "intention" in the contract of a ISet<T>
and IReadOnlySet<T>
that it should be O(1) lookup performance?
I understand you can't prevent people doing weird stuff in an implementation, but if everyone is behaving properly is there at least the intention of it being a O(1) lookup?
No, there is no guarantee that a collection that implements the ISet<T>
interface will have an O(1) Contains
implementation. If you want this guarantee, you should restrict the matching to types with documented O(1) behavior:
if (source is HashSet<T>)
As a side-note, the IEnumerable<T>
interface carries strong deferred execution semantics, and you shouldn't enumerate it more than once. There is no guarantee that enumerating an IEnumerable<T>
multiple times will always yield the same items. Also each enumeration can be quite costly, because it might be hitting a database or a remote server. So don't do this if the ids
is IEnumerable<int>
:
return bookings.Where(b => ids.Contains(b.Id)); // Code smell