Having been inspired by this blogpost I set out to refactor the following Linq query using compiled expression trees:
var result = dummies.Select(y =>
y.GetType().GetProperties()
.Where(x => x.GetMethod.IsPublic)
.Where(x => fields.Contains(x.Name, StringComparer.OrdinalIgnoreCase))
.ToDictionary(x => x.Name, x => x.GetValue(y)))
.Where(x => x.Any());
The code is intended to extract a set of values of specified properties, returning a dictionary for each element. In my first encounter with writing my own expression trees I've come up with this solution to generate the property calls:
foreach (string propName in Properties)
{
var prop = typeof(DummyType).GetProperty(propName);
if (prop != null)
{
props.Add(prop);
}
}
var accessors = new List<Tuple<string, Func<DummyType, object>>>();
foreach (var prop in props)
{
var instance = Expression.Parameter(typeof(DummyType));
var call = Expression.Property(instance, prop);
var expr = Expression.Lambda<Func<DummyType, object>>(call, instance).Compile();
accessors.Add(Tuple.Create(prop.Name, expr));
}
For each DummyType element the calls in accessors
will be iterated This implementation is unable to deal with properties returning value types, though I was able to solve that issue using MakeGenericType
combined with a DynamicInvoke
call but because it's documented as "late-bound" I've discarded it to avoid it distorting the performance.
The results are surprising with the Linq query beating my expression trees hands down even though it boxes value types for me and GetProperties
is called for each element, whereas the linq expression property accessors are generated before collecting the values from the collection of dummytypes.
| Method | Mean | Error | StdDev | Ratio | RatioSD |
|--------- |-------------:|-----------:|-----------:|-------:|--------:|
| Linq | 73.09 ns | 0.878 ns | 0.778 ns | 1.00 | 0.00 |
| ExprTree | 16,293.69 ns | 184.834 ns | 172.894 ns | 222.83 | 3.96 |
The benchmarks are generated using benchmark.net.
EDIT: I've refactored the code somewhat
DummyType
dummies
collectionDummyType
public class MyBenchMarks
{
IEnumerable<DummyType> Dummies = DummyType.GenerateDummySet();
IEnumerable<string> Properties = new string[] { "Prop1" };
[Benchmark(Description = "Linq", Baseline = true)]
public Object LinqSolution() => new Mappers().LinqSolution(Properties, Dummies);
[Benchmark(Description = "ExprTree", Baseline = false)]
public void ExprTreeSolution() => new Mappers().ExprTreeSolution(Properties, Dummies);
}
public class Mappers
{
List<Tuple<string, Func<DummyType, object>>> GetAccessors(IEnumerable<string> fields)
{
List<PropertyInfo> props = new List<PropertyInfo>(fields.Select(x => typeof(DummyType).GetProperty(x)).Where(x => x != null));
var accessors = new List<Tuple<string, Func<DummyType, object>>>();
foreach (var prop in props)
{
var instance = Expression.Parameter(typeof(DummyType));
var call = Expression.Property(instance, prop);
var expr = Expression.Lambda<Func<DummyType, object>>(call, instance).Compile();
accessors.Add(Tuple.Create(prop.Name, expr));
}
return accessors;
}
public IEnumerable<KeyValuePair<string, object>> ExprTreeSolution(IEnumerable<string> fields, IEnumerable<DummyType> dummies)
{
List<KeyValuePair<string, object>> result = new List<KeyValuePair<string, object>>();
var accessors = GetAccessors(fields);
foreach (var dummy in dummies)
{
foreach (var accessor in accessors)
{
var propResult = accessor.Item2(dummy);
result.Add(KeyValuePair.Create(accessor.Item1, propResult));
}
}
return result;
}
public IEnumerable<KeyValuePair<string, object>> LinqSolution<T>(IEnumerable<String> fields, IEnumerable<T> dummies)
{
var result = dummies.Select(y =>
y.GetType().GetProperties()
.Where(x => fields.Contains(x.Name, StringComparer.OrdinalIgnoreCase))
.Select(x => KeyValuePair.Create(x.Name, x.GetValue(y))).ToList())
.SelectMany(x => x);
return result;
}
}
public class DummyType
{
public bool Prop0 { get; set; }
public string Prop1 { get; set; }
public int Prop2 { get; set; }
public static List<DummyType> GenerateDummySet()
{
return Enumerable.Range(0, 100).Select(x =>
new DummyType
{
Prop0 = true,
Prop1 = "fooBar",
Prop2 = x
}).ToList();
}
}
The corresponding results:
BenchmarkDotNet = v0.12.1, OS = Windows 10.0.19041.630(2004 /?/ 20H1)
Intel Core i5-8600K CPU 3.60GHz (Coffee Lake), 1 CPU, 6 logical and 6 physical cores
.NET Core SDK=5.0.100 [Host] : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT[AttachedDebugger]
DefaultJob : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT
| Method | Mean | Error | StdDev | Ratio | RatioSD |
|--------- |-------------:|-----------:|-----------:|---------:|--------:|
| Linq | 66.14 ns | 0.162 ns | 0.143 ns | 1.00 | 0.00 |
| ExprTree | 70,366.57 ns | 500.248 ns | 443.457 ns | 1,063.84 | 7.51 |
this code can be run from a console application using
BenchmarkRunner.Run(typeof(Program).Assembly);
The main problem is that LinqSolution is returning a deferred LINQ IEnumerable<>
. It's not actually doing the work (reflection). Try changing return result
to return result.ToList()
. That'll at least help ensure you're comparing apples to apples.
Beyond that, recognize that the act of compiling an expression is quite expensive. You probably won't see a big performance gain unless you reuse the compiled function many more times. To see this in action, try generating 10000 items in GenerateDummySet instead of just 100.
To take advantage of this in real-world code, try memoizing the compiled function (using a static Lazy<>
initialization, for example).