When returning an array or collection, you must indicate that there are zero items by returning either null or a collection instance with no items. The better choice, in general, is to return a collection instance with no items. In so doing, you avoid forcing the caller to check for null before iterating over the items in the collection. For example, given a zero-size IEnumerable<T> collection, the caller can immediately and safely use a foreach loop over the collection without concern that the generated call to GetEnumerator() will throw a NullReferenceException. Consider using the Enumerable.Empty<T>() method to easily generate an empty collection of a given type.
One of the few times to deviate from this guideline is when null is intentionally indicating something different from zero items. For example, a collection of user names for a website might be null to indicate that an up-to-date collection could not be obtained for some reason; that is semantically different from an empty collection.
Chapter 15 went into detail on the internals of the foreach loop. This section discusses how to use iterators to create your own implementation of the IEnumerator<T>, IEnumerable<T>, and corresponding nongeneric interfaces for custom collections. Iterators provide clean syntax for specifying how to iterate over data in collection classes, especially using the foreach loop. The iterator allows end users of a collection to navigate its internal structure without knowledge of that structure.
If classes want to support iteration using the foreach loop construct, they must implement the enumerator pattern. As Chapter 15 describes, in C# the foreach loop construct is expanded by the compiler into the while loop construct based on the IEnumerator<T> interface that is retrieved from the IEnumerable<T> interface.
The problem with the enumeration pattern is that it can be cumbersome to implement manually, because it must maintain all the state necessary to describe the current position in the collection. This internal state may be simple for a list collection type class; the index of the current position usually suffices. In contrast, for data structures that require recursive traversal, such as binary trees, the state can be quite complicated.4
Iterators are a means to implement methods of a class, and they are syntactic shortcuts for the more complex enumerator pattern. When the C# compiler encounters an iterator, it expands its contents into CIL code that implements the enumerator pattern. As such, there are no runtime dependencies for implementing iterators. Because the C# compiler handles implementation through CIL code generation, there is no real runtime performance benefit from using iterators. However, a substantial programmer productivity gain may be realized by choosing iterators over the manual implementation of the enumerator pattern. To understand how this improvement arises, we must first consider how an iterator is defined in code.
An iterator provides a shorthand implementation of iterator interfaces, the combination of the IEnumerable<T> and IEnumerator<T> interfaces. Listing 17.13 declares an iterator for the generic BinaryTree<T> type by creating a GetEnumerator() method. Next, you add support for the iterator interfaces.
As Listing 17.13 shows, we need to provide an implementation for the GetEnumerator() method.
Iterators are like functions, but instead of returning a single value, they yield a sequence of values, one at a time. In the case of BinaryTree<T>, the iterator yields a sequence of values of the type argument provided for T. If the nongeneric version of IEnumerator is used, the yielded values are instead of type object.
To correctly implement the iterator pattern, you need to maintain some internal state to keep track of where you are while enumerating the collection. In the BinaryTree<T> case, you track which elements within the tree have already been enumerated and which are still to come. Iterators are transformed by the compiler into a “state machine” that keeps track of the current position and knows how to “move itself” to the next position.
The yield return statement yields a value each time an iterator encounters it; control immediately returns to the caller that requested the item. (An interesting point here is that control really does immediately return, unlike, ironically, the return statement. A return statement runs finally blocks along the way; yield return does not.) When the caller requests the next item, the code begins to execute immediately following the previously executed yield return statement. In Listing 17.14, you return the C# built-in data type keywords sequentially. Results are shown in Output 17.5.
The output from this listing is a listing of the C# built-in types.
When GetEnumerator() is first called in a foreach statement, such as foreach (string keyword in keywords) in Listing 17.14, an iterator object is created and its state is initialized to a special “start” state that represents the fact that no code has executed in the iterator and, therefore, no values have been yielded yet. The iterator maintains its state as long as the foreach statement at the call site continues to execute. Every time the loop requests the next value, control enters the iterator and continues where it left off the previous time around the loop; the state information stored in the iterator object is used to determine where control must resume. When the foreach statement at the call site terminates, the iterator’s state is no longer saved.
It is always safe to call GetEnumerator() again; “fresh” enumerator objects will be created when necessary.
Figure 17.8 shows a high-level sequence diagram of what takes place. Remember that the MoveNext() method appears on the IEnumerator<T> interface.
In Listing 17.13, the foreach statement at the call site initiates a call to GetEnumerator() on the CSharpBuiltInTypes instance called keywords. Given the iterator instance (referenced by iterator), foreach begins each iteration with a call to MoveNext(). Within the iterator, you yield a value back to the foreach statement at the call site. After the yield return statement, the GetEnumerator() method seemingly pauses until the next MoveNext() request. Back at the loop body, the foreach statement displays the yielded value on the screen. It then loops back around and calls MoveNext() on the iterator again. Notice that the second time, control picks up at the second yield return statement. Once again, the foreach displays on the screen what CSharpBuiltInTypes yielded and starts the loop again. This process continues until there are no more yield return statements within the iterator. At that point, the foreach loop at the call site terminates because MoveNext() returns false.
Before you modify BinaryTree<T>, you must modify Pair<T> to support the IEnumerable<T> interface using an iterator. Listing 17.15 is an example that yields each element in Pair<T>.
In Listing 17.15, the iteration over the Pair<T> data type loops twice: first through yield return First and then through yield return Second. Each time the yield return statement within GetEnumerator() is encountered, the state is saved and execution appears to “jump” out of the GetEnumerator() method context and into the loop body. When the second iteration starts, GetEnumerator() begins to execute again with the yield return Second statement.
System.Collections.Generic.IEnumerable<T> inherits from System.Collections.IEnumerable. Therefore, when implementing IEnumerable<T>, it is also necessary to implement IEnumerable. In Listing 17.15, you do so explicitly, and the implementation simply involves a call to IEnumerable<T>’s GetEnumerator() implementation. This call from IEnumerable.GetEnumerator() to IEnumerable<T>.GetEnumerator() will always work because of the type compatibility (via inheritance) between IEnumerable<T> and IEnumerable. Since the signatures for both versions of GetEnumerator() are identical (the return type does not distinguish a signature), one or both implementations must be explicit. Given the additional type safety offered by IEnumerable<T>’s version, you implement IEnumerable’s implementation explicitly.
Listing 17.16 uses the Pair<T>.GetEnumerator() method and displays "Inigo" and "Montoya" on two consecutive lines.
Notice that the call to GetEnumerator() is implicit within the foreach loop.
It is not necessary to hardcode each yield return statement, as you did in both CSharpBuiltInTypes and Pair<T>. Using the yield return statement, you can return values from inside a loop construct. Listing 17.17 uses a foreach loop. Each time the foreach within GetEnumerator() executes, it returns the next value.
In Listing 17.17, the first iteration returns the root element within the binary tree. During the second iteration, you traverse the pair of subelements. If the subelement pair contains a non-null value, you traverse into that child node and yield its elements. Note that foreach (T item in tree) is a recursive call to a child node.
As observed with CSharpBuiltInTypes and Pair<T>, you can now iterate over BinaryTree<T> using a foreach loop. Listing 17.18 demonstrates this process, and Output 17.6 shows the results.
Sometimes you might want to cancel further iteration. You can do so by including an if statement so that no further statements within the code are executed. However, you can also use yield break to cause MoveNext() to return false and control to return immediately to the caller and end the loop. Listing 17.19 shows an example of such a method.
This method cancels the iteration if either of the elements in the Pair<T> class is null.
A yield break statement is similar to placing a return statement at the top of a function when it is determined that there is no work to do. It is a way to exit from further iterations without surrounding all remaining code with an if block. As such, it allows multiple exits. Use it with caution, because a casual reading of the code may overlook the early exit.
Previous iterator examples implemented IEnumerable<T>.GetEnumerator(), the method that foreach seeks implicitly. Sometimes you might want different iteration sequences, such as iterating in reverse, filtering the results, or iterating over an object projection other than the default. You can declare additional iterators in the class by encapsulating them within properties or methods that return IEnumerable<T> or IEnumerable. If you want to iterate over the elements of Pair<T> in reverse, for example, you could provide a GetReverseEnumerator() method, as shown in Listing 17.21.
Note that you return IEnumerable<T>, not IEnumerator<T>. This is different from IEnumerable<T>.GetEnumerator(), which returns IEnumerator<T>. The code in Main() demonstrates how to call GetReverseEnumerator() using a foreach loop.
You can use the yield return statement only in members that return an IEnumerator<T> or IEnumerable<T> type, or their nongeneric equivalents. Members whose bodies include a yield return statement may not have a simple return. If the member uses the yield return statement, the C# compiler generates the necessary code to maintain the state of the iterator. In contrast, if the member uses the return statement instead of yield return, the programmer is responsible for maintaining his own state machine and returning an instance of one of the iterator interfaces. Further, just as all code paths in a method with a return type must contain a return statement accompanied by a value (assuming they don’t throw an exception), so all code paths in an iterator must contain a yield return statement if they are to return any data.
The following additional restrictions on the yield statement result in compiler errors if they are violated: