I have the following code, which I would like to speed up:
foreach (var workflow in closedWorkflows)
{
var w = relatedWorkflows.FirstOrDefault(r => r.Fingerprint != workflow.Fingerprint && r.TradeDate == workflow.TradeDate);
if (w != null)
{
relatedWorkflows.Add(workflow);
}
}
relatedWorkflows
and closedWorkflows
are list of same type of objects.
I thought about creating lookup or dictionary with an anonymous class or tuple on Fingerprint
and TradeDate
, but one check is for equality and other one is for inequality.
Would it be a good approach to create lookups on TradeDate
, and then create lookups on Fingerprint
for each lookup list of TradeDate
?
Updated:
A BIG thanks to @Dmitry Bychenko.
In my test the difference is 41486 ms vs 26ms !!!
The main problem with performance is with
var w = relatedWorkflows.FirstOrDefault(...)
we scan relatedWorkflows
again and again, that's why we have O(closedWorkflows.Count * relatedWorkflows.Count)
time complexity.
We can get rid of relatedWorkflows.Count
with the help of hash-based collections (dictionary and hashset), e.g.
var dict = workflow
.GroupBy(item => item.TradeDate, item => item.FingerPrint)
.ToDictionary(group => group.Key, group => group.ToHashSet());
foreach (var workflow in closedWorkflows) {
// If we have a date, but NOT print we add it into relatedWorkflows
if (dict.TryGetValue(workflow.TradeDate, out var prints) &&
prints.Add(workflow.Fingerprint)) {
relatedWorkflows.Add(workflow);
}
}