Matthew Cox Team : Web Development Tags : Technology Web Development

Fun with Expression Trees

Matthew Cox Team : Web Development Tags : Technology Web Development

A colleague of mine recently pointed me to a Visual Studio UserVoice item.

Apparently Microsoft is (again) considering implementing what they are referring to the safe navigation operator. I attempted to explain that it wasn’t a function that .net required since we could already achieve the same thing with a simple extension method. To which his response was “That’s so boring”, undeterred I attempted to show my colleague this extension method and was met with “Oh god, I just don’t care alright”. So rather than attempt further discussion I decided to write a blog about it.

For those that haven’t clicked the link above, basically the safe navigation operator looks like this:

Parent?.Child

 

And works basically like:

(Parent == null) ? null : Parent.Child

 

The idea being that by adding this operator you will be able to cut out a lot of boilerplate code for doing null checks before accessing object properties.

 

Getting back to the extension method I was talking about earlier. This is it:

public static TResult Get<T, TResult>(this T source, Func<T, TResult> func) where T : class
{
    if (source == null)
        return default(TResult);
    return func(source);
}

 

With the syntactic sugar Microsoft are proposing you could use:

Parent?.Child?.Child?.Value

 

With the current tooling and the above extension method you would be able to get the same result with:

Parent.Get(p => p.Child).Get(p => p.Child).Get(p => p.Value)

 

I know from a little bit of research that this extension is hardly a new idea, and in fact most people have chosen to name the same function IfNotNull, whereas I chose Get simply to get the number of keystrokes down to as few as possible. As you can see from the example above it didn’t work too well, with my example being more than double the length. That was when I hit on an idea, what if I could do something like this:

Parent.Get(p => p.Child.Child.Value)

 

I could pass in an expression tree and then parse the tree and evaluate each step individually to ensure I could safely get the value I was after.

That was when I came up with this:

public static TResult RecursiveGet<T, TResult>(this T source, Expression<Func<T, TResult>> func) where T : class
{
    if (!(func.Body is MemberExpression))
        throw new InvalidOperationException();
 
    var result = InnerRecursiveGet(source, (MemberExpression)func.Body);
 
    if (result == null)
        return default(TResult);
 
    return (TResult)result;
}
 
private static object InnerRecursiveGet(object obj, MemberExpression exp)
{
    if (exp.Expression is MemberExpression)
        obj = InnerRecursiveGet(obj, (MemberExpression)exp.Expression);
 
    if (obj == null)
        return null;
 
    return Expression.Lambda(Expression.PropertyOrField(Expression.Constant(obj), exp.Member.Name)).Compile().DynamicInvoke();
}

 

We simply walk our way through the expression tree and evaluate each member expression individually checking for a null every step of the way.

While it does work, something about that .Complie() didn’t sit right with me. Compilation takes a long time right? What I really should do instead of creating all these small expressions is just create one big one. Which lead me to this: (try not to look directly at it, you might go blind)

public static TResult ExpressionGet<T, TResult>(this T source, Expression<Func<T, TResult>> func) where T : class
{
    if (source == null)
        return default(TResult);
 
    var memberExpressions = new Stack();
    var item = func.Body;
 
    while (item.NodeType == ExpressionType.Call || item.NodeType == ExpressionType.MemberAccess)
    {
        memberExpressions.Push(item);
 
        if (memberExpressions.Peek() is MemberExpression)
            item = ((MemberExpression)item).Expression;
        else if (memberExpressions.Peek() is MethodCallExpression)
            item = ((MethodCallExpression)item).Object;
    }
 
    if (!(item is ParameterExpression))
        throw new InvalidOperationException();
 
    var parameter = Expression.Parameter(typeof(T));
    var returnVariable = Expression.Variable(typeof(TResult));
 
    var body = Expression.Block(
        new[] { returnVariable },
        InnerExpressionGet(memberExpressions, parameter, returnVariable),
        returnVariable
    );
 
    return (TResult)Expression.Lambda(body, parameter).Compile().DynamicInvoke(source);
}
 
private static Expression InnerExpressionGet(Stack expressionList, ParameterExpression parameter, ParameterExpression returnValue)
{
    var expression = expressionList.Pop();
    var variable = Expression.Variable(expression.Type);
 
    Expression innerExpression;
 
    switch (expression.NodeType)
    {
        case ExpressionType.MemberAccess:
            var member = (MemberExpression)expression;
            innerExpression = Expression.PropertyOrField(parameter, member.Member.Name);
            break;
        case ExpressionType.Call:
            var call = (MethodCallExpression)expression;
            innerExpression = Expression.Call(parameter, call.Method, call.Arguments);
            break;
        default:
            throw new InvalidOperationException();
    }
 
    if (expressionList.Count == 0)
        return Expression.IfThen(Expression.NotEqual(parameter, Expression.Constant(null)),
            Expression.Assign(returnValue, innerExpression)
        );
 
    return Expression.IfThen(Expression.NotEqual(parameter, Expression.Constant(null)),
        Expression.Block(
            new[] { variable },
            Expression.Assign(variable, innerExpression),
            InnerExpressionGet(expressionList, variable, returnValue)
        )
    );
}

 

So what does all this do? Basically it takes an expression tree that looks like this:

(((($p.Child).Child).Child).Child).Value

 

And turns it into an expression tree that looks like this: (By the way, I love these expression tree debug views. They make it so much easier to work out what’s going on.)

.Block(System.Int32 $var1) {
    .If ($var2 != null) {
        .Block(SampleApp.Parent $var3) {
            $var3 = $var2.Child;
            .If ($var3 != null) {
                .Block(SampleApp.Parent $var4) {
                    $var4 = $var3.Child;
                    .If ($var4 != null) {
                        .Block(SampleApp.Parent $var5) {
                            $var5 = $var4.Child;
                            .If ($var5 != null) {
                                .Block(SampleApp.Parent $var6) {
                                    $var6 = $var5.Child;
                                    .If ($var6 != null) {
                                        $var1 = $var6.Value
                                    } .Else {
                                        .Default(System.Void)
                                    }
                                }
                            } .Else {
                                .Default(System.Void)
                            }
                        }
                    } .Else {
                        .Default(System.Void)
                    }
                }
            } .Else {
                .Default(System.Void)
            }
        }
    } .Else {
        .Default(System.Void)
    };
    $var1
}

 

Now if I’ve gone to all of this effort to rewrite the expression tree in one go for performance reasons I better do some performance tests to see that I have actually improved the efficiency of this method.

Before I get to the tests and results, when I was researching this I did find another person who had written an extension method to achieve the same deep navigation that I was going for. What he came up with was this:

public static TResult TrivialGet<T, TResult>(this T source, Func<T, TResult> func) where T : class
{
    try
    {
        return func(source);
    }
    catch
    {
        return default(TResult);
    }
}

 

I laughed when I saw it and dismissed it as just trolling, however on the off chance that anyone out there doesn’t know why this is a bad idea I’ve included it in my tests to hopefully make it clear why you shouldn’t ever do something like this.

The tests themselves are simply new up a deeply nested object and try to get a value from it and time the results with a stopwatch. In my example my class was nested 4 levels deep and the extension methods were run 10,000 times.

Accessing property 4 levels deep. Successful test.

Extension Method

Timing

Get

3ms

RecursiveGet

7776ms

ExpressionGet

3020ms

TrivialGet

1ms

 

Accessing property 5 levels deep. Failing test.

Extension Method

Timing

Get

3ms

RecursiveGet

7980ms

ExpressionGet

3399ms

TrivialGet

120468ms

 

It’s nice to see from those results that my gut feeling that recursively recompiling expression trees is slower than my abomination of code that completely rewrites them. It also fairly clearly shows why what I’ve called TrivialGet is a bad idea. It executes beautifully up until the point where a null value is encountered, at which point an exception is thrown. Exceptions are slow okay. Walking the call stack to create the stack trace is something that takes a significant amount of resources and needs to be avoided wherever possible. From 1ms for 10,000 executions to 2mins for the same number of executions should give you an idea of how slow.

In the end all this really proves is that common sense still exists. When you make things easier for developers you make them harder for computers and vice versa. I, personally, would only begrudgingly accept the use of the original Get extension method in a production system and never any of the others. For something as fundamental and frequently used as a null check, everything bar the first extension are just far too slow. Finally I also question the architectural practice that leads you to having this sort of broken object chain and you not caring about where the break has occurred, but accept that it may be fine in some scenarios that I can’t think of right away.