Thus far we’ve seen that lambda expressions are a succinct syntax for declaring an “inline” method that can be converted to a delegate type. Expression lambdas (but not statement lambdas or anonymous methods) can also be converted to expression trees. A delegate is an object that enables you to pass around a method like any other object and invoke it at any time. An expression tree is an object that enables you to pass around the compiler’s analysis of the lambda expression. But why would you ever need that capability? Obviously, the compiler’s analysis is useful to the compiler when generating the CIL, but why is it useful to the developer to have an object representing that analysis at execution time? Let’s take a look at an example.
Consider the lambda expression in the following code:
persons.Where(
person => person.Name.ToUpper() == "INIGO MONTOYA");
Suppose that persons is an array of Persons, and the formal parameter of the Where method that corresponds to the lambda expression argument is of the delegate type Func<Person, bool>. The compiler emits a method that contains the code in the body of the lambda. It generates code that creates a delegate to the emitted method and passes the delegate to the Where method. The Where method returns a query object that, when executed, applies the delegate to each member of the array to determine the query results.
Now suppose that persons is not an array of Persons, but rather an object that represents a remote database table containing data on millions of people. Information about each row in the table can be streamed from the server to the client, and the client can then create a Person object corresponding to that row. The call to Where returns an object that represents the query. When the results of that query are requested on the client, how are the results determined?
One technique would be to transmit several million rows of data from the server to the client. You could create a Person object from each row, create a delegate from the lambda, and execute the delegate on every Person. This is conceptually no different from the array scenario, but it is far, far more expensive.
A second, much better technique is to somehow send the meaning of the lambda (“filter out every row that names a person other than Inigo Montoya”) to the server. Database servers are optimized to rapidly perform this sort of filtering. The server can then choose to stream only the tiny number of matching rows to the client. Thus, instead of creating millions of Person objects and rejecting almost all of them, the client creates only those objects that already match the query, as determined by the server. But how does the meaning of the lambda get sent to the server?
This scenario is the motivation for adding expression trees to the language. Lambda expressions converted to expression trees become objects that represent data describing the lambda expression rather than compiled code implementing an anonymous function. Since the expression tree represents data rather than compiled code, it is possible to analyze the lambda at execution time and use that information to construct a query that executes on a database, for example. The expression tree received by Where() might be converted into a SQL query that is passed to a database, as shown in Listing 13.25.
The expression tree passed to the Where() call says that the lambda argument consists of the following elements:
The Where() method takes this data and converts it to the SQL where clause by examining the data and building a SQL query string. However, SQL is just one possibility; you can build an expression tree evaluator that converts expressions to any query language.
At execution time, a lambda converted to an expression tree becomes an object graph containing objects from the System.Linq.Expressions namespace. The root object in the graph represents the lambda itself. This object refers to objects representing the parameters, a return type, and body expression, as shown in Figure 13.3. The object graph contains all the information that the compiler deduced about the lambda. That information can then be used at execution time to create a query. Alternatively, the root lambda expression has a method, Compile, that generates CIL on the fly and creates a delegate that implements the described lambda.
Figure 13.4 shows the types found in object graphs for unary and binary expressions in the body of a lambda.
A UnaryExpression represents an expression such as –count. It has a single child, Operand, of type Expression. A BinaryExpression has two child expressions, Left and Right. Both types have a NodeType property that identifies the specific operator, and both inherit from the base class Expression. Another 30 or so expression types, such as NewExpression, ParameterExpression, MethodCallExpression, and LoopExpression, are available that represent (almost) every possible expression in C# and Visual Basic.
The validity of a lambda expression is verified at compile time with a full semantic analysis, whether it is converted to a delegate or an expression tree. A lambda that is converted to a delegate causes the compiler to emit the lambda as a method and generates code that creates a delegate to that method at execution time. A lambda that is converted to an expression tree causes the compiler to generate code that creates an instance of LambdaExpression at execution time. But when using the Language Integrated Query (LINQ) API, how does the compiler know whether to generate a delegate, to execute a query locally, or to generate an expression tree so that information about the query can be sent to the remote database server?
The methods used to build LINQ queries, such as Where(), are extension methods. The versions of those methods that extend the IEnumerable<T> interface take delegate parameters; the methods that extend the IQueryable<T> interface take expression tree parameters. The compiler, therefore, can use the type of the collection that is being queried to determine whether to create delegates or expression trees from lambdas supplied as arguments.
Consider, for example, the Where() method in the following code:
persons.Where( person => person.Name.ToUpper() ==
"INIGO MONTOYA");
The extension method signature declared in the System.Linq.Enumerable class is
public IEnumerable<TSource> Where<TSource>(
this IEnumerable<TSource> collection,
Func<TSource, bool> predicate);
The extension method signature declared in the System.Linq.Queryable class is
public IQueryable<TSource> Where<TSource>(
this IQueryable<TSource> collection,
Expression<Func<TSource, bool>> predicate);
The compiler decides which extension method to use on the basis of the compile-time type of persons; if it is a type convertible to IQueryable<Person>, the method from System.Linq.Queryable is chosen. It converts the lambda to an expression tree. At execution time, the object referred to by persons receives the expression tree data and might use that data to build a SQL query, which is then passed to the database when the results of the query are requested. The result of the call to Where is an object that, when asked for query results, sends the query to the database and produces the results.
If persons cannot be converted implicitly to IQueryable<Person> but can be converted implicitly to IEnumerable<Person>, the method from System.Linq.Enumerable is chosen, and the lambda is converted to a delegate. The result of the call to Where is an object that, when asked for query results, applies the generated delegate as a predicate to every member of the collection and produces the results that match the predicate.
Converting a lambda expression to an Expression<TDelegate> creates an expression tree rather than a delegate, as noted previously. Earlier in this chapter, we also explored how to convert a lambda such as (x,y)=>x>y to a delegate type such as Func<int, int, bool>. To turn this same lambda into an expression tree, we simply convert it to Expression<Func<int, int, bool>>, as shown in Listing 13.26. We can then examine the generated object and display information about its structure, as well as that of a more complex expression tree.
Note that passing an instance of the expression tree to Console.WriteLine() automatically converts the expression tree to a descriptive string form; the objects generated for expression trees all override ToString() so that you can see the contents of an expression tree at a glance when debugging.
In Output 13.5, we see that the Console.WriteLine() statements within Main() print out the body of the expression trees as text.
The important point to note is that an expression tree is a collection of data, and by iterating over the data, it is possible to convert the data to another format. In this case, we convert the expression tree to descriptive strings, but it could also be converted to expressions in another query language.
Using recursion, the PrintNode() function demonstrates that nodes in an expression tree are themselves trees containing zero or more child expression trees. The root tree that represents the lambda refers to the expression that is the body of the lambda with its Body property. Every expression tree node includes a NodeType property of enumerated type ExpressionType that describes what kind of expression it is. Numerous types of expressions exist: BinaryExpression, ConditionalExpression, LambdaExpression, MethodCallExpression, ParameterExpression, and ConstantExpression are examples. Each type derives from Expression.
Although the expression tree library now contains objects to represent most of the statements of C# and Visual Basic, neither of these languages actually supports the conversion of statement lambdas to expression trees. Only expression lambdas can be converted to expression trees.