Search code examples
c#reflectionexpression-trees

Why does this linq with reflection statement beat my compiled expression tree?


Having been inspired by this blogpost I set out to refactor the following Linq query using compiled expression trees:

var result = dummies.Select(y =>
                   y.GetType().GetProperties()
                   .Where(x => x.GetMethod.IsPublic)
                   .Where(x => fields.Contains(x.Name, StringComparer.OrdinalIgnoreCase))
                   .ToDictionary(x => x.Name, x => x.GetValue(y)))
                   .Where(x => x.Any());

The code is intended to extract a set of values of specified properties, returning a dictionary for each element. In my first encounter with writing my own expression trees I've come up with this solution to generate the property calls:

        foreach (string propName in Properties)
        {
            var prop = typeof(DummyType).GetProperty(propName);

            if (prop != null)
            {
                props.Add(prop);
            }
        }

        var accessors = new List<Tuple<string, Func<DummyType, object>>>();

        foreach (var prop in props)
        {
            var instance = Expression.Parameter(typeof(DummyType));
            var call = Expression.Property(instance, prop);
            var expr = Expression.Lambda<Func<DummyType, object>>(call, instance).Compile();
            accessors.Add(Tuple.Create(prop.Name, expr));
        }

For each DummyType element the calls in accessors will be iterated This implementation is unable to deal with properties returning value types, though I was able to solve that issue using MakeGenericType combined with a DynamicInvoke call but because it's documented as "late-bound" I've discarded it to avoid it distorting the performance.

The results are surprising with the Linq query beating my expression trees hands down even though it boxes value types for me and GetProperties is called for each element, whereas the linq expression property accessors are generated before collecting the values from the collection of dummytypes.

|   Method |         Mean |      Error |     StdDev |  Ratio | RatioSD |
|--------- |-------------:|-----------:|-----------:|-------:|--------:|
|     Linq |     73.09 ns |   0.878 ns |   0.778 ns |   1.00 |    0.00 |
| ExprTree | 16,293.69 ns | 184.834 ns | 172.894 ns | 222.83 |    3.96 |

The benchmarks are generated using benchmark.net.

  1. Why is the expression tree approach significantly slower?
  2. Would it be fair to assume the expression tree solution is faster?
  3. Bonus: What effect does using the MakeGenericType solution have on performance in this context?

EDIT: I've refactored the code somewhat

  • It no longer works with generic types but only on DummyType
  • The linq solution no longer queries for public properties only to deny it the performance benefit
  • The expr. tree solution only gets the "accessors"/"function pointers" (?) once when called with the properties to extract and set of DummyType instances whereas the linq query retrieves the properties for every element in the dummies collection
  • Only the string property is extracted to avoid having to cater for value type properties on DummyType
  • A colllection of dictionaries is no longer created but instead a 1 dimensional collection of key-value pairs
    public class MyBenchMarks    
    {
        IEnumerable<DummyType> Dummies = DummyType.GenerateDummySet();
        IEnumerable<string> Properties = new string[] { "Prop1" };

        [Benchmark(Description = "Linq", Baseline = true)]
        public Object LinqSolution() => new Mappers().LinqSolution(Properties, Dummies);


        [Benchmark(Description = "ExprTree", Baseline = false)]
        public void ExprTreeSolution() => new Mappers().ExprTreeSolution(Properties, Dummies);
    }

    public class Mappers
    {
        List<Tuple<string, Func<DummyType, object>>> GetAccessors(IEnumerable<string> fields)
        {
            List<PropertyInfo> props = new List<PropertyInfo>(fields.Select(x => typeof(DummyType).GetProperty(x)).Where(x => x != null));
            var accessors = new List<Tuple<string, Func<DummyType, object>>>();

            foreach (var prop in props)
            {
                var instance = Expression.Parameter(typeof(DummyType));
                var call = Expression.Property(instance, prop);
                var expr = Expression.Lambda<Func<DummyType, object>>(call, instance).Compile();

                accessors.Add(Tuple.Create(prop.Name, expr));
            }

            return accessors;
        }

        public IEnumerable<KeyValuePair<string, object>> ExprTreeSolution(IEnumerable<string> fields, IEnumerable<DummyType> dummies)
        {
            List<KeyValuePair<string, object>> result = new List<KeyValuePair<string, object>>();
            var accessors = GetAccessors(fields);

            foreach (var dummy in dummies)
            {
                foreach (var accessor in accessors)
                {
                    var propResult = accessor.Item2(dummy);
                    result.Add(KeyValuePair.Create(accessor.Item1, propResult));
                }
            }

            return result;
        }

        public IEnumerable<KeyValuePair<string, object>> LinqSolution<T>(IEnumerable<String> fields, IEnumerable<T> dummies)
        {
            var result = dummies.Select(y =>
                  y.GetType().GetProperties()
                  .Where(x => fields.Contains(x.Name, StringComparer.OrdinalIgnoreCase))
                  .Select(x => KeyValuePair.Create(x.Name, x.GetValue(y))).ToList())
                  .SelectMany(x => x);

            return result;
        }
    }

    public class DummyType
    {
        public bool Prop0 { get; set; }
        public string Prop1 { get; set; }
        public int Prop2 { get; set; }

        public static List<DummyType> GenerateDummySet()
        {
            return Enumerable.Range(0, 100).Select(x =>
                 new DummyType
                 {
                     Prop0 = true,
                     Prop1 = "fooBar",
                     Prop2 = x
                 }).ToList();
        }
    }

The corresponding results:

BenchmarkDotNet = v0.12.1, OS = Windows 10.0.19041.630(2004 /?/ 20H1)
Intel Core i5-8600K CPU 3.60GHz (Coffee Lake), 1 CPU, 6 logical and 6 physical cores
.NET Core SDK=5.0.100  [Host]     : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT[AttachedDebugger]
DefaultJob : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT

|   Method |         Mean |      Error |     StdDev |    Ratio | RatioSD |
|--------- |-------------:|-----------:|-----------:|---------:|--------:|
| Linq     | 66.14 ns     | 0.162 ns   | 0.143 ns   | 1.00     | 0.00    |
| ExprTree | 70,366.57 ns | 500.248 ns | 443.457 ns | 1,063.84 | 7.51    |

this code can be run from a console application using

BenchmarkRunner.Run(typeof(Program).Assembly);

Solution

  • The main problem is that LinqSolution is returning a deferred LINQ IEnumerable<>. It's not actually doing the work (reflection). Try changing return result to return result.ToList(). That'll at least help ensure you're comparing apples to apples.

    enter image description here

    Beyond that, recognize that the act of compiling an expression is quite expensive. You probably won't see a big performance gain unless you reuse the compiled function many more times. To see this in action, try generating 10000 items in GenerateDummySet instead of just 100.

    enter image description here

    To take advantage of this in real-world code, try memoizing the compiled function (using a static Lazy<> initialization, for example).