Inconsistent behavior of List.Contains and List.IndexOf when measuring speed

I need to process large numbers of strings quickly using C#. To find the fastest way to do this, I've been using the following benchmarking function:

delegate void Test();
static void time(Test test, int iter, string label)
    {
        Stopwatch timer = new Stopwatch();
        timer.Reset();
        timer.Start();

        int i = 0;
        while (i < iter)
        {
            test();
            i++;
        }

        Console.WriteLine(label + ": " + timer.ElapsedMilliseconds.ToString());
        timer.Reset();
    }

When I run this code:

int iter = 10000000;
string[] array = new string[] { "cat", "dog", "horse", "cow", "dimorphodon", "a", "a", "dog", "horse", "cow", "dimorphodon", "a", "a", "pig" };
List<string> list = new List<string>(array);

time(() => { int i = 0; while (i < array.Length) { if (array[i] == "cat") { return; } i++; } return; }, iter, "array search near        ");
time(() => { int i = 0; while (i < array.Length) { if (array[i] == "pig") { return; } i++; } return; }, iter, "array search far         ");
time(() => { int i = Array.IndexOf(array, "cat"); }, iter, "array IndexOf near       ");
time(() => { int i = Array.IndexOf(array, "pig"); }, iter, "array IndexOf far        ");
time(() => { list.Contains("cat"); }, iter, "list contains near        ");
time(() => { list.Contains("pig"); }, iter, "list contains far         ");
time(() => { int i = list.IndexOf("cat"); }, iter, "list IndexOf near        ");
time(() => { int i = list.IndexOf("pig"); }, iter, "list IndexOf far         ");
time(() => { int i = 0; while (i < list.Count) { if (list[i] == "cat") { return; } i++; } return; }, iter, "list search near         ");
time(() => { int i = 0; while (i < list.Count) { if (list[i] == "pig") { return; } i++; } return; }, iter, "list search far          ");

with optimizations enabled, I consistently see that iteratively searching an array is the fastest option, with searching the list being very slightly slower, while List.IndexOf and Array.IndexOf are much slower (3-4 times slower for values near the front of the list, with the gap narrowing for higher indices and hitting 2 times as slow around 20-30 elements) and List.Contains is slowest of all (~1.2 times slower than IndexOf, with or without optimizations).

I see some discussion of why Contains might be slower than an iterative search here (Contains is implemented generically and so creates a generic EqualityComparer object that is unneeded when dealing with strings), but although the difference between the implementation of Contains and IndexOf is elaborated on here, looking at the implementation only seems to confirm that contains should be the same (they both create a generic EqualityComparer and use it to compare the elements of the list's internal array to the argument).

This suggests to me that the problem is likely with my benchmarking function. Is there some element to the differences between Contains and IndexOf that I'm missing, or is there a problem with my benchmarking function?

EDIT: I'm now sure there is a problem with my benchmarking. Performing a slightly different set of tests:

time(() => { list.Contains("cat"); }, iter, "list short contains ");
time(() => { list.Contains("cat"); }, iter, "list short indexof  ");
time(() => { list.Contains("cat"); }, iter*10, "list long contains  ");
time(() => { list.Contains("cat"); }, iter*10, "list long indexof   ");
time(() => { list.Contains("cow"); }, iter, "list short contains ");
time(() => { list.Contains("cow"); }, iter, "list short indexof  ");
time(() => { list.Contains("cow"); }, iter * 10, "list long contains  ");
time(() => { list.Contains("c"); }, iter * 10, "list long indexof   ");

results in completely different times, with indexof and contains performing much closer together. I have no idea how or why this might be possible.

Edit again: Just to be clear, I'm almost certain that the weirdness in timing has something to do with my implementation of the timing function, though it might be something to do with the optimizer. I am seeing that usually, List.Contains is slower than List.IndexOf somehow, but not all the time, and changing the order in which I measure times somehow gives me different results. My question is, what is causing the inconsistency in times noted above?

Solution

I'll noodle about this a little bit, bench-marking is a complicated art and has many subtleties and traps. The bigger problem with this kind of test is that you are profiling code that is fast. I clock the array search test at 8 nanoseconds on my poky laptop. Such a result is strongly Heisenbergian, the test itself significantly affects the measurement.

There is a lot that needs to happen to make that very fast code execute. The delegate target needs to be just-in-time compiled, the delegate needs to bound to the jitted code, the delegate call is always an indirect call that cannot be optimized. And the while() loop that repeats the test iter times itself adds overhead. All code whose execution time is included with the measurement but not what you actually care about.

A significant randomizer of the test results is the processor itself. Modern ones have exceedingly non-deterministic code execution behavior. They critically depend on the state of the memory caches and how much the core has learned about the way the code executes, the branch predictor heavily affects execution time.

That the placement of the code affects the test result is a side-effect of what a processor does with the machine code that the jitter generates. Processors do not execute that code directly anymore. The actual execution engine of the processor executes a completely different kind of instruction, a micro-op. Targeting a RISC-like processor, micro-ops are simple instructions that can be easily distributed among the various execution units and executed out-of-order. A processor like Haswell can execute as many as 8 of those micro-ops at the same time.

That doesn't actually happen that often in practice. A very nontrivial circuit in the processor is the instruction decoder, the logic that needs to translate from x86 or x64 machine code to micro-ops. Mental model here is of a just-in-time compiler, just like the one that .NET uses. But with a big difference, this is a jitter that needs to translate from a very convoluted instruction set with variable length instructions to feed a very hungry execution engine that can swallow 8 micro-ops per clock cycle. That this works at all is a stupendous feat and I have no real idea how the hardware designers pulled of that stunt. It doesn't, actually, the decoder cannot keep up with that rate and will usually fall behind if the code is branchy. Like it is in your "near" tests. Alignment of code in the L1 instruction cache plays an important role which is why placement matters. Not a knob you can tweak.

None of this behavior can be easily observed and is rather random. A guideline I use is that measurements that differ by 15% or less are not statistically meaningful.

Your benchmark code has flaws, ones that somewhat accidentally don't affect the results. One severe flaw is that you don't make the code accomplish anything, you don't use the result of the computation. In other words, you don't use the i result at all. The jitter optimizer loves code like that, the fastest possible code is code that doesn't execute. And it it can then it will eliminate such code.

That doesn't happen in this test, the string comparison is too convoluted. The default comparison uses StringComparison.CurrentCulture and that requires a call into a CLR helper method that consults an oracle that says how strings are supposed to be compared according to whatever the Thread.CurrentCulture dictates. That code cannot be inlined so the optimizer cannot eliminate it. This happens in the inner loop of the code so nothing can be eliminated.

So by-and-large, the result you got is probably accurate enough, writing your own Array.IndexOf() method is in fact the fastest way on very short arrays. That isn't very surprising if you look at the Array.IndexOf() source code. Adding the null and Rank test, making it work for non-conformant arrays and relying on Object.Equals() does not come for free. It is very fast but you do see that overhead back since the rest of the code is so fast.

Microsoft deemed the extra work that Array.IndexOf() does worth the overhead, it does make it better code that fails in a more diagnosable way and is more universal. It is cheap enough in practical usage.

Which is the key to bench-marking, the closer the test is to actual practical usage in your program, using real data instead cats and cows, the more trustworthy the test result. Fairly doubtful that you got there.