I need to filter a large list of complex (20+ properties) objects into multiple sub lists. To create the sub-lists, I have a list of filter specifications. Requirements are: a) An item is not allowed to be part of two sub lists and b) it must be possible to get hold of all undivided items after processing has finished.
Currently I use the following algorithm:
My initial list of objects can be over 400.000 objects and I've noticed that both the filtering, as well as reducing the working list takes some time. So I would like to know:
Thanks in advance!
If you can not leverage any indexes in the incoming list you are trying to classify then you are better off just iterating through the whole list only once and classifying the items as you go. This way you avoid unnecessary remove and except operations that are seriously hurting the performance with pointless iterations and equality comparisons.
I was thinking about something a long the lines of:
public static IDictionary<string, List<T>> Classify<T>(this IEnumerable<T> items, IDictionary<string, Predicate<T>> predicates, out List<T> defaultBucket)
{
var classifiedItems = new Dictionary<string, List<T>>(predicates.Count);
defaultBucket = new List<T>();
foreach (var predicate in predicates)
{
classifiedItems.Add(predicate.Key, new List<T>());
}
foreach (var item in items)
{
var matched = false;
foreach (var predicate in predicates)
{
if (predicate.Value(item))
{
matched = true;
classifiedItems[predicate.Key].Add(item);
break;
}
}
if (!matched)
{
defaultBucket.Add(item);
}
}
return classifiedItems;
}
Any given predicate
can be as complex as you need it to be. Only condition is that it takes in a T
and returns a bool
. If that is not enough, nothing is preventing you from implementing your own MyPredicate<???>
with whatever signature you need.
EDIT: Edited the code to handle a "default bucket" where items that don't comply with any of the specified predicates go.