Reflection.Emit Performance

Here's a simple question.

Let's say we want to unroll a looping method such as:

public int DoSum1(int n)
{
    int result = 0;
    for(int i = 1;i <= n; i++)
    {
        result += i;
    }
    return result;
}

Into a method performing simple additions only:

public int DoSum2( )
{
    return 1+2+3+4+5+6+7+8+9+10+11+12+13+14+15+16+17+18+19+20;
}

[http://etutorials.org/Programming/Programming+C.Sharp/Part+III+The+CLR+and+the+.NET+Framework/Chapter+18.+Attributes+and+Reflection/18.3+Reflection+Emit/][1]

Logically, we're going to need code to create DoSum2 in IL at some point. In this IL generation code we will perform an actual loop with the same iteration count than the unoptimized method.

What's the point of creating a super fast dynamic method if the code required to generate it will use a similar amount of time to execute???

Perhaps you can give an example, when it worths using Emit in a similar case?

Solution

What's the point of creating a super fast dynamic method if the code required to generate it will use a similar amount of time to execute

This isn't really specific to Reflection.Emit, but to runtime code generation in general, so I will answer accordingly.

First, I do not recommend using code generation simply to perform micro-optimizations that compilers normally perform like loop unrolling. Let the JIT compiler do it's job.

Second, you are right in that there is usually little point in generating code that will only execute once. The time required to emit and JIT compile the IL is not insubstantial. You should only bother generating code if it will be executed many times.

Now, there definitely are cases where runtime code generation can prove beneficial. In fact, it's a technique I leverage heavily. I work in an electronic trading environment where it is necessary to process very high volumes of dynamic data. This introduces several concerns, the most significant being memory usage and throughput.

Our trading application needs to keep a lot of data in memory, so the footprint of each record is critical. Dynamic data structures like maps/dictionaries are less efficient than "POCO" classes with optimized field layouts and, depending on the design, may require boxing some values. I avoid this overhead by generating client-side storage classes once the shape of the data is known. In effect, the memory layout is as it would have been had I known the shape of the data at compile time.

Throughput is a major issue as well; (de)serializing dynamic data often involves some additional introspection and extra layers of indirection. Need to serialize a record? OK, first you need to query what the fields are. Then, for each field, you need to determine its type, then select a serializer for that type, and then invoke the serializer. If your data structure has optional fields, you may need to do some additional pre-processing, like figuring out the size of a presence map, and which bits in the presence map correspond to which fields. If you need to process a ton of data, all that overhead becomes a real problem. I avoid this overhead by generating specialized (de)serializers on both the server side and client side. Since the serializers are generated on demand, they can know the exact shape of the data, and read/write that data as efficiently as a hand-optimized serializer. When you have a high volume of data updating at very high frequencies, this can make a huge difference.

Now, keep in mind that we're something of an edge case. Most applications do not have the aggressive memory and throughput requirements that ours has, so runtime code generation isn't necessary. You should only go that route if you really need it, and you have exhausted all other possibilities. Although it can help with performance, generated code can be very difficult to debug and maintain.