Search code examples
c#compilationgenerated-code

Generate and compile name to index translation/mapping for faster reusability


Suppose I get data from a service (that I can't control) as:

public class Data
{
    // an array of column names
    public string[] ColumnNames { get; set; }

    // an array of rows that contain arrays of strings as column values
    public string[][] Rows { get; get; }
}

and on the middle tier I would like to map/translate this to an IEnumerable<Entity> where column names in Data may be represented as properties in my Entity class. I said may because I may not need all the data returned by the service but just some of it.

Transformation

This is an abstraction of an algorithm that would do the translation:

  1. create an IDictionary<string, int> of ColumnNames so I can easily map individual column names to array indices in individual rows.
  2. use reflection to examine my Entity properties' names so I'm able to match them with column names
  3. iterate through Data.Rows and create my Entity objects and populate properties according to mapping done in #1. Likely using reflection and SetValue on properties to set them.

Optimisation

Upper algorithm would of course work, but I think that because it uses reflection it should do some caching and possibly some on the fly compilation, that could speed things up considerably.

When steps 1 and 2 are done, we could actually generate a method that takes an array of strings and instantiates my entities using indices directly and compile it and cache it for future reuse.

I'm usually getting a page of results, so subsequent requests would reuse the same compiled method.

Additional fact

This is not imperative to the question (and answers) but I also created two attributes that help with column-to-property mapping when these don't match in names. I created the most obvious MapNameAttribute (that takes a string and optionally also enable case sensitivity) and IgnoreMappingAttribute for properties on my Entity that shouldn't map to any data. But these attributes are read in step 2 of the upper algorithm so property names are collected and renamed according to this declarative metadata so they match column names.

Question

What is the best and easiest way to generate and compile such a method? Lambda expressions? CSharpCodeProvider class?

Do you maybe have an example of generated and compiled code that does a similar thing? I guess that mappings are a rather common scenario.

Note: In the meantime I will be examining PetaPoco (and maybe also Massive) because afaik they both do compilation and caching on the fly exactly for mapping purposes.


Solution

  • Suggestion: obtain FastMember from NuGet

    Then just use:

    var accessor = TypeAccessor.Create(typeof(Entity));
    

    Then just in your loop, when you have found the memberName and newValue for the current iteration:

    accessor[obj, memberName] = newValue;
    

    This is designed to do what you are asking; internally, it maintains a set of types if has seen before. When a new type is seen, it creates a new subclass of TypeAccessor on-the-fly (via TypeBuilder) and caches it. Each unique TypeAccessor is aware of the properties for that type, and basically just acts like a:

    switch(memberName) {
        case "Foo": obj.Foo = (int)newValue;
        case "Bar": obj.Bar = (string)newValue;
        // etc
    }
    

    Because this is cached, you only pay any cost (and not really a big cost) the first time it ever sees your type; the rest of the time, it is free. Because it uses ILGenerator directly, it also avoids any unnecessary abstraction, for example via Expression or CodeDom, so it is about as fast as it can be.

    (I should also clarify that for dynamic types, i.e. types that implement IDynamicMetaObjectProvider, it can use a single instance to support every object).


    Additional:

    What you could do is: take the existing FastMember code, and edit it to process MapNameAttribute and IgnoreMappingAttribute during WriteGetter and WriteSetter; then all the voodoo happens on your data names, rather than the member names.

    This would mean changing the lines:

    il.Emit(OpCodes.Ldstr, prop.Name);
    

    and

    il.Emit(OpCodes.Ldstr, field.Name);
    

    in both WriteGetter and WriteSetter, and doing a continue at the start of the foreach loops if it should be ignored.