Essential C#: More Standard Query Operators

Build: 6 Nov, 2024 9:02:02 AM

Standard Query Operators

Besides the methods on System.Object, any type that implements IEnumerable<T> is required to implement only one other method, GetEnumerator(). Yet, doing so makes more than 50 methods available to all types implementing IEnumerable<T>, not including any overloading—and this happens without needing to explicitly implement any method except the GetEnumerator() method. The additional functionality is provided through extension methods8 and resides in the class System.Linq.Enumerable. Therefore, including the using declarative for System.Linq is all it takes to make these methods available.

Each method on IEnumerable<T> is a standard query operator; it provides querying capability over the collection on which it operates. In the following sections, we examine some of the most prominent of these standard query operators. Many of these examples depend on an Inventor and/or Patent class, both of which are defined in Listing 15.9 with Output 15.1.

Listing 15.9: Sample Classes for Use with Standard Query Operators    

using System;            

using System.Collections.Generic;            

public class Program            

{            

    public static void Main()            

    {            

        IEnumerable<Patent> patents = PatentData.Patents;            

        Print(patents);            

        Console.WriteLine();            

        IEnumerable<Inventor> inventors = PatentData.Inventors;            

        Print(inventors);            

    }            

    private static void Print<T>(IEnumerable<T> items)            

    {            

        foreach(T item in items)            

        {            

            Console.WriteLine(item);            

        }            

    }            

}

Output 15.1

Bifocals (1784)

Phonograph (1877)

Kinetoscope (1888)

Electrical Telegraph (1837)

Flying Machine (1903)

Steam Locomotive (1815)

Droplet Deposition Apparatus (1989)

Backless Brassiere (1914)

Benjamin Franklin (Philadelphia, PA)

Orville Wright (Kitty Hawk, NC)

Wilbur Wright (Kitty Hawk, NC)

Samuel Morse (New York, NY)

George Stephenson (Wylam, Northumberland)

John Michaelis (Chicago, IL)

Mary Phelps Jacob (New York, NY)

Filtering with Where()

To filter out data from a collection, we need to provide a filter method that returns true or false, indicating whether or not a particular element should be included. A delegate expression that takes an argument and returns a Boolean value is called a predicate, and a collection’s Where() method depends on predicates for identifying filter criteria, as shown in Listing 15.10. (Technically, the result of the Where() method is an object that encapsulates the operation of filtering a given sequence with a given predicate.) The results appear in Output 15.2.

Listing 15.10: Filtering with System.Linq.Enumerable.Where()    

using System;            

using System.Collections.Generic;            

using System.Linq;            

public class Program            

{            

    public static void Main()            

    {            

        IEnumerable<Patent> patents = PatentData.Patents;            

        patents = patents.Where(            

            patent => patent.YearOfPublication.StartsWith("18"));            

        Print(patents);            

    }            

    // ...            

}

Output 15.2

Phonograph (1877)

Kinetoscope (1888)

Electrical Telegraph (1837)

Steam Locomotive (1815)

Notice that the code assigns the output of the Where() call back to IEnumerable<T>. In other words, the output of IEnumerable<T>.Where() is a new IEnumerable<T> collection. In Listing 15.10, it is IEnumerable<Patent>.

Less obvious is that the Where() expression argument has not necessarily been executed at assignment time. This is true for many of the standard query operators. In the case of Where(), for example, the expression is passed into the collection and “saved” but not executed. Instead, execution of the expression occurs only when it is necessary to begin iterating over the items within the collection. For example, a foreach loop, such as the one in Print() (in Listing 15.9), can trigger the expression to be evaluated for each item within the collection. At least conceptually, the Where() method should be understood as a means of specifying the query regarding what appears in the collection, not the actual work involved with iterating over the items to produce a new collection with potentially fewer items.

Projecting with Select()

Since the output from the IEnumerable<T>.Where() method is a new IEnumerable<T> collection, it is possible to again call a standard query operator on the same collection. For example, rather than just filtering the data from the original collection, we could transform the data (see Listing 15.11).

Listing 15.11: Projection with System.Linq.Enumerable.Select()    

using System;            

using System.Collections.Generic;            

using System.Linq;            

public class Program            

{            

    public static void Main()            

    {            

        IEnumerable<Patent> patents = PatentData.Patents;            

        IEnumerable<Patent> patentsOf1800 = patents.Where(            

            patent => patent.YearOfPublication.StartsWith("18"));            

        IEnumerable<string> items = patentsOf1800.Select(            

            patent => patent.ToString());            

        Print(items);            

    }            

    // ...            

}

In Listing 15.11, we create a new IEnumerable<string> collection. In this case, it just so happens that adding the Select() call doesn’t change the output—but only because Print()’s Console.WriteLine() call used ToString() anyway. Obviously, a transform still occurred on each item from the Patent type of the original collection to the string type of the items collection.

Consider the example using System.IO.FileInfo in Listing 15.12.

Listing 15.12: Projection with System.Linq.Enumerable.Select() and new    

//...            

IEnumerable<string> fileList = Directory.EnumerateFiles(            

    rootDirectory, searchPattern);            

IEnumerable<FileInfo> files = fileList.Select(            

    file => new FileInfo(file));            

//...

Here fileList is of type IEnumerable<string>. However, using the projection offered by Select, we can transform each item in the collection to a System.IO.FileInfo object.

Lastly, capitalizing on tuples, we can create an IEnumerable<T> collection where T is a tuple (see Listing 15.13 and Output 15.3).

Listing 15.13: Projection to Tuple    

//...            

IEnumerable<string> fileList = Directory.EnumerateFiles(            

rootDirectory, searchPattern);            

IEnumerable<(string FileName, long Size)> items = fileList.Select(            

    file =>            

    {            

        FileInfo fileInfo = new(file);            

        return (            

            FileName: fileInfo.Name,            

            Size: fileInfo.Length            

        );            

    });            

//...

Output 15.3

FileName = AssemblyInfo.cs, Size = 1704

FileName = CodeAnalysisRules.xml, Size = 735

FileName = CustomDictionary.xml, Size = 199

FileName = EssentialCSharp.sln, Size = 40415

FileName = EssentialCSharp.suo, Size = 454656

FileName = EssentialCSharp.vsmdi, Size = 499

FileName = EssentialCSharp.vssscc, Size = 256

FileName = intelliTechture.ConsoleTester.dll, Size = 24576

FileName = intelliTechture.ConsoleTester.pdb, Size = 30208

The output of an anonymous type automatically shows the property names and their values as part of the generated ToString() method associated with the anonymous type.

Projection using the Select() method is very powerful. We already saw how to filter a collection vertically (reducing the number of items in the collection) using the Where() standard query operator. Now, via the Select() standard query operator, we can also reduce the collection horizontally (making fewer columns) or transform the data entirely. In combination, Where() and Select() provide a means for extracting only those pieces of the original collection that are desirable for the current algorithm. These two methods alone provide a powerful collection manipulation API that would otherwise result in significantly more—and less readable—code.

AdVanceD Topic

Running LINQ Queries in Parallel

With the widespread availability of computers having multiple processors and multiple cores within those processors, the ability to easily take advantage of the additional processing power becomes far more important. To do so, programs need to support multiple threads so that work can happen simultaneously on different CPUs within the computer. Listing 15.14 demonstrates one way to do this using Parallel LINQ (PLINQ).

Listing 15.14: Executing LINQ Queries in Parallel    

//...            

IEnumerable<string> fileList = Directory.EnumerateFiles(            

    rootDirectory, searchPattern);            

var items = fileList.AsParallel().Select(            

    file =>            

    {            

        FileInfo fileInfo = new(file);            

        return new            

        {            

            FileName = fileInfo.Name,            

            Size = fileInfo.Length            

        };            

    });            

//...

As Listing 15.14 shows, only a minimal change in code is needed to enable parallel support. All that this example uses is a Microsoft .NET Framework 4–introduced standard query operator, AsParallel(), on the static class System.Linq.ParallelEnumerable. Using this simple extension method, the runtime begins executing over the items within the fileList collection and returning the resultant objects in parallel. Each parallel operation in this case isn’t particularly expensive (although it is relative to the other execution taking place), but consider the burden imposed by CPU-intensive operations such as encryption or compression. Running the query in parallel across multiple CPUs can decrease execution time by a factor corresponding to the number of CPU cores.

An important caveat to be aware of (and the reason why AsParallel() appears as an Advanced Topic rather than in the standard text) is that parallel execution can introduce race conditions, such that an operation on one thread can be intermingled with an operation on a different thread, causing data corruption. To avoid this problem, synchronization mechanisms are required on data with shared access from multiple threads to force the operations to be atomic where necessary. Synchronization itself, however, can introduce deadlocks that freeze the execution, further complicating the effective parallel programming.

More details on this and additional multithreading topics are provided in Chapters 19 through 22.

Counting Elements with Count()

Another query frequently performed on a collection of items is to retrieve the count. To support this type of query, LINQ includes the Count() extension method.

Listing 15.15 demonstrates that Count() is overloaded to simply count all elements (no parameters) or to take a predicate that counts only items identified by the predicate expression.

Listing 15.15: Counting Items with Count()    

using System;            

using System.Collections.Generic;            

using System.Linq;            

public class Program            

{            

    public static void Main()            

    {            

        IEnumerable<Patent> patents = PatentData.Patents;            

        Console.WriteLine($"Patent Count: { patents.Count() }");            

        Console.WriteLine($@"Patent Count in 1800s: {            

            patents.Count(patent =>            

                patent.YearOfPublication.StartsWith("18"))}");

In spite of the apparent simplicity of the Count() statement, IEnumerable<T> has not changed, so the executed code still iterates over all the items in the collection. Whenever a Count property is directly available on the collection, it is preferable to use that rather than LINQ’s Count() method (a subtle difference). Fortunately, ICollection<T> includes the Count property, so code that calls the Count() method on a collection that supports ICollection<T> casts the collection and call Count directly. However, if ICollection<T> is not supported, Enumerable.Count() proceeds to enumerate all the items in the collection rather than call the built-in Count mechanism. If the purpose of checking the count is just to see whether it is greater than zero (if(patents.Count() > 0){...}), the preferred approach would be to use the Any() operator (if(patents.Any()){...}). Any() attempts to iterate over only one of the items in the collection to return a true result, rather than iterating over the entire sequence.

Guidelines

DO use System.Linq.Enumerable.Any() rather than calling patents.Count() when checking whether there are more than zero items.

DO use a collection’s Count property (if available) instead of calling the System.Linq.Enumerable.Count() method.

Deferred Execution

One of the most important concepts to remember when using LINQ is deferred execution. Consider the code in Listing 15.16 and the corresponding output in Output 15.4.

Listing 15.16: Filtering with System.Linq.Enumerable.Where()    

using System;            

using System.Collections.Generic;            

using System.Linq;            

// ...            

        IEnumerable<Patent> patents = PatentData.Patents;            

        bool result;            

        patents = patents.Where(            

            patent =>            

            {            

                if(result =            

                    patent.YearOfPublication.StartsWith("18"))            

                {            

                    // Side effects like this in a predicate            

                    // are used here to demonstrate a             

                    // principle and should generally be            

                    // avoided            

                    Console.WriteLine("\t" + patent);            

                }            

                return result;            

            });            

        Console.WriteLine("1. Patents prior to the 1900s are:");            

        foreach(Patent patent in patents)            

        {            

        }            

        Console.WriteLine();            

        Console.WriteLine(            

            "2. A second listing of patents prior to the 1900s:");            

        Console.WriteLine(            

            $@"   There are { patents.Count()            

                } patents prior to 1900.");            

        Console.WriteLine();            

        Console.WriteLine(            

            "3. A third listing of patents prior to the 1900s:");            

        patents = patents.ToArray();            

        Console.Write("   There are ");            

        Console.WriteLine(            

            $"{ patents.Count() } patents prior to 1900.");            

        //...

Output 15.4

1. Patents prior to the 1900s are:

Phonograph (1877)

Kinetoscope (1888)

Electrical Telegraph (1837)

Steam Locomotive (1815)

2. A second listing of patents prior to the 1900s:

Phonograph (1877)

Kinetoscope (1888)

Electrical Telegraph (1837)

Steam Locomotive (1815)

There are 4 patents prior to 1900.

3. A third listing of patents prior to the 1900s:

Phonograph (1877)

Kinetoscope (1888)

Electrical Telegraph (1837)

Steam Locomotive (1815)

There are 4 patents prior to 1900.

Notice that Console.WriteLine("1. Patents prior...) executes before the lambda expression. This characteristic is very important to recognize because it is not obvious to those who are unaware of its importance. In general, predicates should do exactly one thing—evaluate a condition—and should not have any side effects (even printing to the console, as in this example).

To understand what is happening, recall that lambda expressions are delegates—references to methods—that can be passed around. In the context of LINQ and standard query operators, each lambda expression forms part of the overall query to be executed.

At the time of declaration, lambda expressions are not executed. In fact, it isn’t until the lambda expressions are invoked that the code within them begins to execute. Figure 15.2 shows the sequence of operations.

Figure 15.2: Sequence of operations invoking lambda expressions

As Figure 15.2 shows, three calls in Listing 15.14 trigger the lambda expression, and each time it is fairly implicit. If the lambda expression were expensive (such as a call to a database), it would therefore be important to minimize the lambda expression’s execution.

First, the execution is triggered within the foreach loop. As described earlier in the chapter, the foreach loop breaks down into a MoveNext() call, and each call results in the lambda expression’s execution for each item in the original collection. While iterating, the runtime invokes the lambda expression for each item to determine whether the item satisfies the predicate.

Second, a call to Enumerable’s Count() (the function) triggers the lambda expression for each item once more. Again, this is subtle behavior because Count (the property) is very common on collections that have not been queried with a standard query operator.

Third, the call to ToArray() (or ToList(), ToDictionary(), or ToLookup()) evaluates the lambda expression for each item. However, converting the collection with one of these “To” methods is extremely helpful. Doing so returns a collection on which the standard query operator has already executed. In Listing 15.14, the conversion to an array means that when Length is called in the final Console.WriteLine(), the underlying object pointed to by patents is, in fact, an array (which obviously implements IEnumerable<T>); in turn, System.Array’s implementation of Length is called rather than System.Linq.Enumerable’s implementation. Consequently, following a conversion to one of the collection types returned by a “To” method, it is generally safe to work with the collection (until another standard query operator is called). However, be aware that this will bring the entire result set into memory (it may have been backed by a database or file prior to this step). Furthermore, the “To” method will take a snapshot of the underlying data, so that no fresh results will be returned upon requerying the “To” method result.

We strongly encourage you to review the sequence diagram in Figure 15.2 along with the corresponding code and recognize that the deferred execution of standard query operators can result in extremely subtle triggering of the standard query operators; therefore, developers should use caution and seek to avoid unexpected calls. The query object represents the query, not the results. When you ask the query for the results, the whole query executes (perhaps even again) because the query object doesn’t know that the results will be the same as they were during a previous execution (if one existed).

note

To avoid such repeated execution, you must cache the data retrieved by the executed query. To do so, assign the data to a local collection using one of the “To” collection methods. During the assignment call of a “To” method, the query obviously executes. However, iterating over the assigned collection after that point will not involve the query expression any further. In general, if you want the behavior of an in-memory collection snapshot, it is a best practice to assign a query expression to a cached collection to avoid unnecessary iterations.

Sorting with OrderBy() and ThenBy()

Another common operation on a collection is to sort it. Sorting involves a call to System.Linq.Enumerable’s OrderBy(), as shown in Listing 15.17 and Output 15.5.

Listing 15.17: Ordering with System.Linq.Enumerable.OrderBy()/ThenBy()    

using System;            

using System.Collections.Generic;            

using System.Linq;            

// ...            

        IEnumerable<Patent> items;            

        Patent[] patents = PatentData.Patents;            

        items = patents.OrderBy(            

                patent => patent.YearOfPublication)            

            .ThenBy(            

                patent => patent.Title);            

        Print(items);            

        Console.WriteLine();            

        items = patents.OrderByDescending(            

                patent => patent.YearOfPublication)            

            .ThenByDescending(            

                patent => patent.Title);            

        Print(items);            

        //...

Output 15.5

Bifocals (1784)

Steam Locomotive (1815)

Electrical Telegraph (1837)

Phonograph (1877)

Kinetoscope (1888)

Flying Machine (1903)

Backless Brassiere (1914)

Droplet Deposition Apparatus (1989)

Backless Brassiere (1914)

Flying Machine (1903)

Kinetoscope (1888)

Phonograph (1877)

Electrical Telegraph (1837)

Steam Locomotive (1815)

Bifocals (1784)

The OrderBy() call takes a lambda expression that identifies the key on which to sort. In Listing 15.17, the initial sort uses the year in which the patent was published.

Notice that the OrderBy() call takes only a single parameter, keySelector, to sort on. To sort on a second column, it is necessary to use a different method: ThenBy(). Similarly, code would use ThenBy() for any additional sorting.

OrderBy() returns an IOrderedEnumerable<T> interface, not an IEnumerable<T>. Furthermore, IOrderedEnumerable<T> derives from IEnumerable<T>, so all the standard query operators (including OrderBy()) are available on the OrderBy() return. However, repeated calls to OrderBy() would undo the work of the previous call such that the end result would sort by only the keySelector in the final OrderBy() call. For this reason, you should be careful not to call OrderBy() on a previous OrderBy() call.

Instead, you should specify additional sorting criteria using ThenBy(). Although ThenBy() is an extension method, it is not an extension of IEnumerable<T> but rather of IOrderedEnumerable<T>. The method, also defined on System.Linq.Extensions.Enumerable, is declared as follows:

public static IOrderedEnumerable<TSource>

ThenBy<TSource, TKey>(

this IOrderedEnumerable<TSource> source,

Func<TSource, TKey> keySelector)

In summary, use OrderBy() first, followed by zero or more calls to ThenBy() to provide additional sorting “columns.” The methods OrderByDescending() and ThenByDescending() provide the same functionality except that they sort items in descending order. Mixing and matching ascending and descending methods is not a problem, but if sorting items further, you would use a ThenBy() call (either ascending or descending).

Two more important notes about sorting are warranted. First, the actual sort doesn’t occur until you begin to access the members in the collection, at which point the entire query is processed. You can’t sort unless you have all the items to sort, because you can’t determine whether you have the first item. The fact that sorting is delayed until you begin to access the members is due to deferred execution, as described earlier in this chapter. Second, each subsequent call to sort the data (e.g., Orderby() followed by ThenBy() followed by ThenByDescending()) does involve additional calls to the keySelector lambda expression of the earlier sorting calls. In other words, a call to OrderBy() calls its corresponding keySelector lambda expression once you iterate over the collection. Furthermore, a subsequent call to ThenBy() again makes calls to OrderBy()’s keySelector.

Guidelines

DO NOT call an OrderBy() following a prior OrderBy() method call. Use ThenBy() to sequence items by more than one value.

Beginner Topic

Join Operations

Consider two collections of objects as shown in the Venn diagram in Figure 15.3. The left circle in the diagram includes all inventors, and the right circle contains all patents. The intersection includes both inventors and patents, and a line is formed for each case where there is a match of inventors to patents. As the diagram shows, each inventor may have multiple patents and each patent can have one or more inventors. Each patent has an inventor, but in some cases, inventors do not yet have patents.

Figure 15.3: Venn diagram of inventor and patent collections

Matching up inventors within the intersection to patents is an inner join. The result is a collection of inventor/patent pairs in which both patents and inventions exist for a pair. A left outer join includes all the items within the left circle regardless of whether they have a corresponding patent. In this particular example, a right outer join would be the same as an inner join because there are no patents without inventors. Furthermore, the designation of left versus right is arbitrary, so there is really no distinction between left and outer joins. A full outer join, however, would include records from both outer sides; it is relatively rare to perform a full outer join.

Another important characteristic in the relationship between inventors and patents is that it is a many-to-many relationship. Each individual patent can have one or more inventors (e.g., the flying machine’s invention by both Orville and Wilbur Wright). Furthermore, each inventor can have one or more patents (e.g., Benjamin Franklin’s invention of both bifocals and the phonograph).

Another common relationship is a one-to-many relationship. For example, a company department may have many employees. However, each employee can belong to only one department at a time. (However, as is common with one-to-many relationships, adding the factor of time can transform them into many-to-many relationships. A particular employee may move from one department to another so that, over time, they could potentially be associated with multiple departments, making another many-to-many relationship.)

Listing 15.18 provides a sample listing of employee and department data, and Output 15.6 shows the results.

Listing 15.18: Sample Employee and Department Data    

public class Program            

{            

    public static void Main()            

    {            

        IEnumerable<Department> departments =            

            CorporateData.Departments;            

        Print(departments);            

        Console.WriteLine();            

        IEnumerable<Employee> employees =            

            CorporateData.Employees;            

        Print(employees);            

    }            

    private static void Print<T>(IEnumerable<T> items)            

    {            

        foreach(T item in items)            

        {            

            Console.WriteLine(item);            

        }            

    }            

}

Output 15.6

Corporate

Human Resources

Engineering

Information Technology

Philanthropy

Marketing

Mark Michaelis (Chief Computer Nerd)

Michael Stokesbary (Senior Computer Wizard)

Brian Jones (Enterprise Integration Guru)

Anne Beard (HR Director)

Pat Dever (Enterprise Architect)

Kevin Bost (Programmer Extraordinaire)

Thomas Heavey (Software Architect)

Eric Edmonds (Philanthropy Coordinator)

We use this data in the example in the following section on joining data.

Performing an Inner Join with Join()

In the world of objects on the client side, relationships between objects are generally already set up. For example, the relationship between files and the directories in which they reside are preestablished with the DirectoryInfo.GetFiles() method and the FileInfo.Directory method, respectively. Frequently, however, this is not the case with data being loaded from nonobject stores. Instead, the data needs to be joined together so that you can navigate from one type of object to the next in a way that makes sense for the data.

Consider the example of employees and company departments. In Listing 15.19, we join each employee to their department and then list each employee with their corresponding department. Since each employee belongs to only one (and exactly one) department, the total number of items in the list is equal to the total number of employees—each employee appears only once (each employee is said to be normalized). Output 15.7 shows the results.

Listing 15.19: An Inner Join Using System.Linq.Enumerable.Join()    

using System;            

using System.Collections.Generic;            

using System.Linq;            

// ...            

        Department[] departments = CorporateData.Departments;            

        Employee[] employees = CorporateData.Employees;            

        IEnumerable<(int Id, string Name, string Title,            

            Department Department)> items =            

            employees.Join(            

                departments,            

                employee => employee.DepartmentId,            

                department => department.Id,            

                (employee, department) => (            

                    employee.Id,            

                    employee.Name,            

                    employee.Title,            

                    department            

                ));            

        foreach ((int Id, string Name, string Title, Department Department) item in items)            

        {            

            Console.WriteLine(            

                $"{item.Name} ({item.Title})");            

            Console.WriteLine("\t" + item.Department);            

        }            

    }            

    //...

Output 15.7

Mark Michaelis (Chief Computer Nerd)

Corporate

Michael Stokesbary (Senior Computer Wizard)

Engineering

Brian Jones (Enterprise Integration Guru)

Engineering

Anne Beard (HR Director)

Human Resources

Pat Dever (Enterprise Architect)

Information Technology

Kevin Bost (Programmer Extraordinaire)

Engineering

Thomas Heavey (Software Architect)

Engineering

Eric Edmonds (Philanthropy Coordinator)

Philanthropy

The first parameter for Join() has the name inner. It specifies the collection, departments, that employees joins to. The next two parameters are lambda expressions that specify how the two collections will connect. employee => employee.DepartmentId (with a parameter name of outerKeySelector) identifies that on each employee, the key will be DepartmentId. The next lambda expression (department => department.Id) specifies the Department’s Id property as the key—in other words, for each employee, join a department where employee.DepartmentId equals department.Id. The last parameter is the resultant item that is selected. In this case, it is a tuple with Employee’s Id, Name, and Title, as well as a Department property with the joined department object.

Notice in the output that Engineering appears multiple times—once for each employee in CorporateData. In this case, the Join() call produces a Cartesian product between all the departments and all the employees, such that a new record is created for every case where a record exists in both collections and the specified department IDs are the same. This type of join is an inner join.

The data could also be joined in reverse, such that department joins to each employee to list each department-to-employee match. Notice that the output includes more records than there are departments: There are multiple employees for each department, and the output is a record for each match. As we saw before, the Engineering department appears multiple times, once for each employee.

The code in Listing 15.20 (which produces Output 15.8) is similar to that in Listing 15.19, except that the objects, Departments and Employees, are reversed. The first parameter to Join() is employees, indicating what departments joins to. The next two parameters are lambda expressions that specify how the two collections will connect: department => department.Id for departments and employee => employee.DepartmentId for employees. As before, a join occurs whenever department.Id equals employee.EmployeeId. The final tuple parameter specifies a class with int Id, string Name, and Employee Employee items. (Specifying the names in the expression is optional but used here for clarity.)

Listing 15.20: Another Inner Join with System.Linq.Enumerable.Join()    

using System;            

using System.Collections.Generic;            

using System.Linq;            

// ...            

        Department[] departments = CorporateData.Departments;            

        Employee[] employees = CorporateData.Employees;            

        IEnumerable<(long Id, string Name, Employee Employee)> items =            

            departments.Join(            

                employees,            

                department => department.Id,            

                employee => employee.DepartmentId,            

                (department, employee) => (            

                    department.Id,             

                    department.Name,             

                    Employee: employee)            

                );            

        foreach ((long Id, string Name, Employee Employee) item in items)            

        {            

            Console.WriteLine(item.Name);            

            Console.WriteLine("\t" + item.Employee);            

        }            

        //...

Output 15.8

Corporate

Mark Michaelis (Chief Computer Nerd)

Human Resources

Anne Beard (HR Director)

Engineering

Michael Stokesbary (Senior Computer Wizard)

Engineering

Brian Jones (Enterprise Integration Guru)

Engineering

Kevin Bost (Programmer Extraordinaire)

Engineering

Thomas Heavey (Software Architect)

Information Technology

Pat Dever (Enterprise Architect)

Philanthropy

Eric Edmonds (Philanthropy Coordinator)

In addition to ordering and joining a collection of objects, you might want to group objects with like characteristics. For the employee data, you might want to group employees by department, region, job title, and so forth. Listing 15.21 shows an example of how to do this with the GroupBy() standard query operator (see Output 15.9 to view the results).

Listing 15.21: Grouping Items Using System.Linq.Enumerable.GroupBy()    

using System;            

using System.Collections.Generic;            

using System.Linq;            

// ...            

        IEnumerable<Employee> employees = CorporateData.Employees;            

        IEnumerable<IGrouping<int, Employee>> groupedEmployees =            

            employees.GroupBy((employee) => employee.DepartmentId);            

        foreach(IGrouping<int, Employee> employeeGroup in            

            groupedEmployees)            

        {            

            Console.WriteLine();            

            foreach(Employee employee in employeeGroup)            

            {            

                Console.WriteLine(employee);            

            }            

            Console.WriteLine(            

              "\tCount: " + employeeGroup.Count());            

        }            

        //...

Output 15.9

Mark Michaelis (Chief Computer Nerd)

Count: 1

Michael Stokesbary (Senior Computer Wizard)

Brian Jones (Enterprise Integration Guru)

Kevin Bost (Programmer Extraordinaire)

Thomas Heavey (Software Architect)

Count: 4

Anne Beard (HR Director)

Count: 1

Pat Dever (Enterprise Architect)

Count: 1

Eric Edmonds (Philanthropy Coordinator)

Count: 1

Note that the items output from a GroupBy() call are of type IGrouping<TKey, TElement>, which has a property for the key that the query is grouping on (employee.DepartmentId). However, it does not have a property for the items within the group. Rather, IGrouping<TKey, TElement> derives from IEnumerable<T>, allowing for enumeration of the items within the group using a foreach statement or for aggregating the data into something such as a count of items (employeeGroup.Count()).

Implementing a One-to-Many Relationship with GroupJoin()

Listing 15.19 and Listing 15.20 are virtually identical. Either Join() call could have produced the same output just by changing the tuple definition. When trying to create a list of employees, Listing 15.19 provides the correct result. Department ends up as an item of both tuples representing the joined employee. However, Listing 15.20 is not ideal. Given support for collections, a more preferable representation of a department would have a collection of employees rather than a single tuple for each department–employee relationship. Listing 15.22 demonstrates the creation of such a child collection; Output 15.10 shows the preferred output.

Listing 15.22: Creating a Child Collection with System.Linq.Enumerable.GroupJoin()    

using System;            

using System.Collections.Generic;            

using System.Linq;            

// ...            

        Department[] departments = CorporateData.Departments;            

        Employee[] employees = CorporateData.Employees;            

        IEnumerable<(long Id, string Name, IEnumerable<Employee> Employees)> items =            

            departments.GroupJoin(            

                employees,            

                department => department.Id,            

                employee => employee.DepartmentId,            

                (department, departmentEmployees) => (            

                    department.Id,            

                    department.Name,            

                    departmentEmployees            

                ));            

        foreach (            

            (_, string name, IEnumerable<Employee> employeeCollection) in items)            

        {            

            Console.WriteLine(name);            

            foreach (Employee employee in employeeCollection)            

            {            

                Console.WriteLine("\t" + employee);            

            }            

        }            

        //...

Output 15.10

Corporate

Mark Michaelis (Chief Computer Nerd)

Human Resources

Anne Beard (HR Director)

Engineering

Michael Stokesbary (Senior Computer Wizard)

Brian Jones (Enterprise Integration Guru)

Kevin Bost (Programmer Extraordinaire)

Thomas Heavey (Software Architect)

Information Technology

Pat Dever (Enterprise Architect)

Philanthropy

Eric Edmonds (Philanthropy Coordinator)

To achieve the preferred result, we use System.Linq.Enumerable’s GroupJoin() method. The parameters are the same as those in Listing 15.19, except for the final tuple selected. In Listing 15.19, the lambda expression is of type Func<Department, IEnumerable<Employee>, (long Id, string Name, IEnumerable<Employee> Employees)>. Notice that we use the second type argument (IEnumerable<Employee>) to project the collection of employees for each department onto the resultant department tuple; thus each department in the resulting collection includes a list of the employees.

(Readers familiar with SQL will notice that, unlike Join(), GroupJoin() doesn’t have a SQL equivalent because the data returned by SQL is record based, not hierarchical.)

AdVanced Topic

Implementing an Outer Join with GroupJoin()

The earlier inner joins are equi-joins because they are based on an equivalent evaluation of the keys. Records appear in the resultant collection only if there are objects in both collections. On occasion, however, you might want to create a record even if the corresponding object doesn’t exist. For example, rather than leaving the Marketing department out from the final department list simply because it doesn’t have any employees, it would be preferable if we included it with an empty employee list. To accomplish this, we perform a left outer join using a combination of both GroupJoin() and SelectMany(), along with DefaultIfEmpty(). This is demonstrated in Listing 15.23 and Output 15.11.

Listing 15.23: Implementing an Outer Join Using GroupJoin() with SelectMany()    

using System;            

using System.Linq;            

// ...            

        Department[] departments = CorporateData.Departments;            

        Employee[] employees = CorporateData.Employees;            

        var items = departments.GroupJoin(            

            employees,            

            department => department.Id,            

            employee => employee.DepartmentId,            

            (department, departmentEmployees) => new            

            {            

                department.Id,            

                department.Name,            

                Employees = departmentEmployees            

            }).SelectMany(            

                departmentRecord =>            

                    departmentRecord.Employees.DefaultIfEmpty(),            

                (departmentRecord, employee) => new            

                {            

                    departmentRecord.Id,            

                    departmentRecord.Name,            

                    departmentRecord.Employees            

                }).Distinct();            

        foreach (var item in items)            

        {            

            Console.WriteLine(item.Name);            

            foreach (Employee employee in item.Employees)            

            {            

                Console.WriteLine("\t" + employee);            

            }            

        }            

        //...

Output 15.11

Corporate

Mark Michaelis (Chief Computer Nerd)

Human Resources

Anne Beard (HR Director)

Engineering

Michael Stokesbary (Senior Computer Wizard)

Brian Jones (Enterprise Integration Guru)

Kevin Bost (Programmer Extraordinaire)

Thomas Heavey (Software Architect)

Information Technology

Pat Dever (Enterprise Architect)

Philanthropy

Eric Edmonds (Philanthropy Coordinator)

Marketing

Calling SelectMany()

On occasion, you may have collections of collections. Listing 15.24 provides an example of such a scenario. The teams array contains two teams, each with a string array of players.

Listing 15.24: Calling SelectMany()    

using System;            

using System.Collections.Generic;            

using System.Linq;            

// ...            

        (string Team, string[] Players)            

            [] worldCup2006Finalists = new[]            

        {            

            (            

                TeamName: "France",            

                Players: new string[]            

                {            

                    "Fabien Barthez", "Gregory Coupet",            

                    "Mickael Landreau", "Eric Abidal",            

                    "Jean-Alain Boumsong", "Pascal Chimbonda",            

                    "William Gallas", "Gael Givet",            

                    "Willy Sagnol", "Mikael Silvestre",            

                    "Lilian Thuram", "Vikash Dhorasoo",            

                    "Alou Diarra", "Claude Makelele",            

                    "Florent Malouda", "Patrick Vieira",            

                    "Zinedine Zidane", "Djibril Cisse",            

                    "Thierry Henry", "Franck Ribery",            

                    "Louis Saha", "David Trezeguet",            

                    "Sylvain Wiltord",            

                }            

            ),            

            (            

                TeamName: "Italy",            

                Players: new string[]            

                {            

                    "Gianluigi Buffon", "Angelo Peruzzi",            

                    "Marco Amelia", "Cristian Zaccardo",            

                    "Alessandro Nesta", "Gianluca Zambrotta",            

                    "Fabio Cannavaro", "Marco Materazzi",            

                    "Fabio Grosso", "Massimo Oddo",            

                    "Andrea Barzagli", "Andrea Pirlo",            

                    "Gennaro Gattuso", "Daniele De Rossi",            

                    "Mauro Camoranesi", "Simone Perrotta",            

                    "Simone Barone", "Luca Toni",            

                    "Alessandro Del Piero", "Francesco Totti",            

                    "Alberto Gilardino", "Filippo Inzaghi",            

                    "Vincenzo Iaquinta",            

                }            

            )            

        };            

        IEnumerable<string> players =            

            worldCup2006Finalists.SelectMany(            

                team => team.Players);            

        Print(players);            

        //...

The output from this listing has each player’s name displayed on its own line in the order in which it appears in the code. The difference between Select() and SelectMany() is that Select() would return two items, one corresponding to each item in the original collection. Select() may project out a transform from the original type, but the number of items would not change. For example, teams.Select(team => team.Players) will return an IEnumerable<string[]>.

In contrast, SelectMany() iterates across each item identified by the lambda expression (the array selected by Select() earlier) and hoists out each item into a new collection that includes a union of all items within the child collection. Instead of two arrays of players, SelectMany() combines each array selected and produces a single collection of all items.

More Standard Query Operators

Listing 15.25 shows code that uses some of the simpler APIs enabled by Enumerable; Output 15.12 shows the results.

Listing 15.25: More System.Linq.Enumerable Method Calls    

using System;            

using System.Collections.Generic;            

using System.Linq;            

public class Program            

{            

    public static void Main()            

    {            

        IEnumerable<object> stuff =            

            new object[] { new(), 1, 3, 5, 7, 9,            

                "\"thing\"", Guid.NewGuid() };            

        Print("Stuff: {0}", stuff);            

        IEnumerable<int> even = new int[] { 0, 2, 4, 6, 8 };            

        Print("Even integers: {0}", even);            

        IEnumerable<int> odd = stuff.OfType<int>();            

        Print("Odd integers: {0}", odd);            

        IEnumerable<int> numbers = even.Union(odd);            

        Print("Union of odd and even: {0}", numbers);            

        Print("Union with even: {0}", numbers.Union(even));            

        Print("Concat with odd: {0}", numbers.Concat(odd));            

        Print("Intersection with even: {0}",            

            numbers.Intersect(even));            

        Print("Distinct: {0}", numbers.Concat(odd).Distinct());            

        if (!numbers.SequenceEqual(            

            numbers.Concat(odd).Distinct()))            

        {            

            throw new Exception("Unexpectedly unequal");            

        }            

        else            

        {            

            Console.WriteLine(            

                @"Collection ""SequenceEquals""" +            

                $" {nameof(numbers)}.Concat(odd).Distinct())");            

        }            

        Print("Reverse: {0}", numbers.Reverse());            

        Print("Average: {0}", numbers.Average());            

        Print("Sum: {0}", numbers.Sum());            

        Print("Max: {0}", numbers.Max());            

        Print("Min: {0}", numbers.Min());            

    }            

    private static void Print<T>(            

            string format, IEnumerable<T> items)            

            where T : notnull =>            

        Console.WriteLine(format, string.Join(            

            ", ", items));            

    // ...            

}

Output 15.12

Stuff: System.Object, 1, 3, 5, 7, 9, "thing"

24c24a41-ee05-41b9-958e-50dd12e3981e

Even integers: 0, 2, 4, 6, 8

Odd integers: 1, 3, 5, 7, 9

Union of odd and even: 0, 2, 4, 6, 8, 1, 3, 5, 7, 9

Union with even: 0, 2, 4, 6, 8, 1, 3, 5, 7, 9

Concat with odd: 0, 2, 4, 6, 8, 1, 3, 5, 7, 9, 1, 3, 5, 7, 9

Intersection with even: 0, 2, 4, 6, 8

Distinct: 0, 2, 4, 6, 8, 1, 3, 5, 7, 9

Collection "SequenceEquals" numbers.Concat(odd).Distinct())

Reverse: 9, 7, 5, 3, 1, 8, 6, 4, 2, 0

Average: 4.5

Sum: 45

Max: 9

Min: 0

None of the API calls in Listing 15.25 requires a lambda expression. Table 15.1 and Table 15.2 describe each method and provide an example. Included in System.Linq.Enumerable is a collection of aggregate functions that enumerate the collection and calculate a result (shown in Table 15.2). Count is one example of an aggregate function already shown in the chapter.

Table 15.1: Simpler Standard Query Operators

Comment Type	Description
OfType<T>()	Forms a query over a collection that returns only the items of a particular type, where the type is identified in the type parameter of the OfType<T>() method call.
Union()	Combines two collections to form a superset of all the items in both collections. The final collection does not include duplicate items even if the same item existed in both collections.
Concat()	Combines two collections to form a superset of both collections. Duplicate items are not removed from the resultant collection. Concat() will preserve the ordering. That is, concatenating {A, B} with {C, D} will produce {A, B, C, D}.
Intersect()	Extracts the collection of items that exist in both original collections.
Distinct()	Filters out duplicate items from a collection so that each item within the resultant collection is unique.
SequenceEquals()	Compares two collections and returns a Boolean indicating whether the collections are identical, including the order of items within the collection. (This is a very helpful message when testing expected results.)
Reverse()	Reverses the items within a collection so that they occur in reverse order when iterating over the collection.

Table 15.2: Aggregate Functions on System.Linq.Enumerable

Comment Type	Description
Count()	Provides a total count of the number of items within the collection
Average()	Calculates the average value for a numeric collection
Sum()	Computes the sum values within a numeric collection
Max()	Determines the maximum value among a collection of numeric values
Min()	Determines the minimum value among a collection of numeric values

Note that each method listed in Table 15.1 and Table 15.2 will trigger deferred execution.

AdVanced Topic

Queryable Extensions for IQueryable<T>

One virtually identical interface to IEnumerable<T> is IQueryable<T>. Because IQueryable<T> derives from IEnumerable<T>, it has all the members of IEnumerable<T> that are declared directly (e.g., GetEnumerator()). Extension methods are not inherited, however, so IQueryable<T> doesn’t have any of the Enumerable extension methods. Nevertheless, it does have a similar extending class called System.Linq.Queryable that adds to IQueryable<T> almost all of the same methods that Enumerable added to IEnumerable<T>. Therefore, it provides a very similar programming interface.

What makes IQueryable<T> unique is that it enables custom LINQ providers. A LINQ provider subdivides expressions into their constituent parts. Once divided, the expression can be translated into another language, serialized for remote execution, injected with an asynchronous execution pattern, and much more. Essentially, LINQ providers allow for an interception mechanism into a standard collection API; behavior relating to the queries and collection can be injected via this seemingly limitless functionality.

For example, LINQ providers allow for the translation of a query expression from C# into SQL that is then executed on a remote database. In so doing, the C# programmer can remain in their primary object-oriented language and leave the translation to SQL to the underlying LINQ provider. Through this type of expression, programming languages can span the impedance mismatch between the object-oriented world and the relational database.

In the case of IQueryable<T>, vigilance regarding deferred execution is even more critical. Imagine, for example, a LINQ provider that returns data from a database. Rather than retrieving the data from a database regardless of the selection criteria, the lambda expression would provide an implementation of IQueryable<T> that possibly includes context information such as the connection string, but not the data itself. The data retrieval wouldn’t occur until the call to GetEnumerator() or even MoveNext(). However, the GetEnumerator() call is generally implicit, such as when iterating over the collection with foreach or calling an Enumerable method such as Count<T>() or Cast<T>(). Obviously, cases such as this require developers to be wary of the subtle and repeated calls to any expensive operation that deferred execution might involve. For example, if calling GetEnumerator() involves a distributed call over the network to a database, it would be wise to avoid unintentional duplicate calls to iterations with Count() or foreach.

________________________________________

8. Starting in C# 3.0.

Contents