Stumbled upon something while trying to evaluate which events I need to create.
I had a code like this:
var eventsToBeCreated =
requiredEventDates.Where(d => !events.Select(e => e.eventDay).Contains(d));
But it made me wonder if this is not such a good idea performance wise, because I believe (I am not sure) the Select()
gets evaluated individually for every element, so I changed it to:
var existingEventDays =
events.Select(e => e.eventDay);
var eventsToBeCreated =
requiredEventDates.Where(d => !existingEventDays.Contains(d));
But I was not sure about this either. As existingEventdays
is an IEnumerable<DateTime>
I guess this would still lead to the enumerable to be resolved multiple times? So I changed it to:
var existingEventDays =
events.Select(e => e.eventDay).ToList();
var eventsToBeCreated =
requiredEventDates.Where(d => !existingEventDays.Contains(d));
..to make sure that the existingEventDays
get calculated only one time.
Are my assumptions correct or is this not necessary and the first version would offer the same performance as the third?
I'll assume you actually consume the whole query created with Where
, like calling ToList()
. If you don't consume it, then nothing in the Where
lambda is executed. You're just creating a bunch of IEnumerable<T>
s. See Deferred Execution.
Regarding the second snippet, you extracted the Select
call to a variable, this indeed causes events.Select
to only be called once, instead of once for every element in requiredEventDates
. But again, due to Deferred Execution, calling Select
itself is not very expensive. It is the looping that Contains
does that is usually expensive.
Regarding the third snippet, you first made a list out of the dates from the events
. This loops through the entirety of events
. And Contain
loops through the list for each element in requiredEventDates
, on top of that. So you essentially looped through the whole list one more time than necessary.
To avoid all this looping, you can instead put the dates into a HashSet
:
var existingEventDays =
events.Select(e => e.eventDay).ToHashSet();
var eventsToBeCreated =
requiredEventDates.Where(d => !existingEventDays.Contains(d));
Now you only loop through events
once, to create the set. And Contains
looks up d
in the set, which can be a lot faster than looking things up in a list.