While writing a solution for a coding problem I discovered an interesting behavior of my LINQ statements. I had two scenarios:
First:
arr.Select(x => x + 5).OrderBy(x => x)
Second:
arr.OrderBy(x => x).Select(x => x + 5)
After a little bit of testing with System.Diagnostics.Stopwatch I got the following results for an integer array of length 100_000.
For the first approach:
00:00:00.0000152
For the second:
00:00:00.0073650
Now I'm interested in why it takes more time if I do the ordering first. I wasn't able to find something on google so I thought about it by myself.
I ended up with 2 Ideas:
1. The second scenario has to convert to IOrderedEnumerable and then back to IEnumerable while the first scenario only has to convert to IOrderedEnumerable and not back.
2. You end up having 2 loops. The first for sorting and the second for the selecting while approach 1 does everything in 1 loop.
So my question is why does it take much more time to do the ordering before select?
Depending on which Linq-provider you have, there may happen some optimization on the query. E.g. if you´d use some kind of database, chances are high your provider would create the exact same query for both statements similar to this one:
select myColumn from myTable order by myColumn;
Thus performamce should be identical, no matter if you order first in Linq or select first.
As this does not seem to happen here, you probably use Linq2Objects, which has no optimization at all. So the order of your statements may have an efffect, in particular if you´d have some kind of filter using Where
which would filter many objects out so that later statements won´t operate on the entire collection.
To keep long things short: the difference most probably comes from some internal initialzation-logic. As a dataset of 100000 numbers is not really big - at least not big enough - even some fast initialization has a big impact.