Primary Collection Classes

Five key categories of collection classes exist, and they differ from one another in terms of how data is inserted, stored, and retrieved. Each generic class is located in the System.Collections.Generic namespace, and their nongeneric equivalents are found in the System.Collections namespace.

List Collections: List<T>

The List<T> class has properties similar to an array. The key difference is that these classes automatically expand as the number of elements increases. (In contrast, an array size is constant.) Furthermore, lists can shrink via explicit calls to TrimToSize() or Capacity (see Figure 17.2).

Figure 17.2: List<> class diagrams

These classes are categorized as list collections whose distinguishing functionality is that each element can be individually accessed by index, just like an array. Therefore, you can set and access elements in the list collection classes using the index operator, where the index parameter value corresponds to the position of an element in the collection. Listing 17.1 shows an example, and Output 17.1 shows the results.

Listing 17.1: Using List<T>
1. using System;
2. using System.Collections.Generic;
3.  
4. public class Program
5. {
6.     public static void Main()
7.     {
8.         List<string> list = new() { "Sneezy""Happy""Dopey",  "Doc",
9.                                     "Sleepy""Bashful",  "Grumpy"};
10.  
11.         list.Sort();
12.  
13.         Console.WriteLine(
14.             $"In alphabetical order { list[0] } is the "
15.             + $"first dwarf while { list[^1] } is the last.");
16.  
17.         list.Remove("Grumpy");
18.     }
19. }
Output 17.1
In alphabetical order Bashful is the first dwarf while Sneezy is the last.

C# is zero-index based; therefore, index 0 in Listing 17.1 corresponds to the first element and index 6 indicates the seventh element. Retrieving elements by index does not involve a search. Rather, it entails a quick and simple “jump” operation to a location in memory.

A List<T> is an ordered collection; the Add() method appends the given item to the end of the list. Before the call to Sort() in Listing 17.1, "Sneezy" is first and "Grumpy" is last; after the call, the list is sorted into alphabetical order rather than the order in which items were added. Some collections automatically sort elements as they are added, but List<T> is not one of them; an explicit call to Sort() is required for the elements to be sorted.

To remove an element, you use the Remove() or RemoveAt() method to either remove a given element or remove whatever element is at a particular index, respectively.

AdVanced Topic
Customizing Collection Sorting

You might have wondered how the List<T>.Sort() method in Listing 17.1 knew how to sort the elements of the list into alphabetical order. The string type implements the IComparable<string> interface, which has one method, CompareTo(). It returns an integer indicating whether the element passed is greater than, less than, or equal to the current element. If the element type implements the generic IComparable<T> interface (or the nongeneric IComparable interface), the sorting algorithm will, by default, use it to determine the sorted order.

But what if either the element type does not implement IComparable<T> or the default logic for comparing two things does not meet your needs? To specify a nondefault sort order, you can call the overload of List<T>.Sort(), which takes IComparer<T> as an argument.

The difference between IComparable<T> and IComparer<T> is subtle but important. The first interface means, “I know how to compare myself to another instance of my type.” The latter means, “I know how to compare two things of a given type.”

The IComparer<T> interface is typically used when there are many different possible ways of sorting a data type and none is obviously the best. For example, you might have a collection of Contact objects that you sometimes want to sort by name, by location, by birthday, by geographic region, or by any number of other possibilities. Rather than choosing a sorting strategy and making the Contact class implement IComparable<Contact>, it might be wiser to create several different classes that implement IComparer<Contact>. Listing 17.2 shows a sample implementation of a LastName, FirstName comparison.

Listing 17.2: Implementing IComparer<T>
1. using System;
2. using System.Collections.Generic;
3. // ...
4. public class Contact
5. {
6.     public string FirstName { getprivate set; }
7.     public string LastName { getprivate set; }
8.  
9.     public Contact(string firstName, string lastName)
10.     {
11.         this.FirstName = firstName;
12.         this.LastName = lastName;
13.     }
14. }
15. public class NameComparison : IComparer<Contact>
16. {
17.     public int Compare(Contact? x, Contact? y)
18.     {
19.         if(Object.ReferenceEquals(x, y))
20.             return 0;
21.         if(x is null)
22.             return 1;
23.         if(y is null)
24.             return -1;
25.         int result = StringCompare(x.LastName, y.LastName);
26.         if(result == 0)
27.             result = StringCompare(x.FirstName, y.FirstName);
28.         return result;
29.     }
30.  
31.     private static int StringCompare(string? x, string? y)
32.     {
33.         if(Object.ReferenceEquals(x, y))
34.             return 0;
35.         if(x is null)
36.             return 1;
37.         if(y is null)
38.             return -1;
39.         return x.CompareTo(y);
40.     }
41. }

To sort a List<Contact> by last name and then first name, you can call contactList.Sort(new NameComparer()).

Total Ordering

You are required to produce a total order when implementing IComparable<T> or IComparer<T>. Your implementation of CompareTo must provide a fully consistent ordering for any possible pair of items. This ordering is required to have a number of basic characteristics. For example, every element is required to be considered equal to itself. If an element X is considered to be equal to element Y, and element Y is considered to be equal to element Z, then all three elements X, Y, and Z must be considered equal to one another. If an element X is considered to be greater than Y, Y must be considered to be less than X. And there must be no “transitivity paradoxes”—that is, you cannot have X greater than Y, Y greater than Z, and Z greater than X. If you fail to provide a total ordering, the action of the sort algorithm is undefined; it may produce a crazy ordering, it may crash, it may go into an infinite loop, and so on.

Notice, for example, how the comparer in Listing 17.2 ensures a total order, even if the arguments are null references. It would not be legal to say, “If either element is null, then return zero,” for example, because then two non-null things could be equal to null but not equal to each other.

Guidelines
DO ensure that custom comparison logic produces a consistent “total order.”
Searching a List<T>

To search List<T> for a particular element, you use the Contains(), IndexOf(), LastIndexOf(), and BinarySearch() methods. The first three methods search through the array, starting at the first element (or the last element for LastIndexOf()), and examine each element until the desired one is found. The execution time for these algorithms is proportional to the number of elements searched before a hit occurs. (Be aware that the collection classes do not require that all the elements within the collection are unique. If two or more elements in the collection are the same, IndexOf() returns the first index and LastIndexOf() returns the last index.)

BinarySearch() uses a much faster binary search algorithm but requires that the elements be sorted. A useful feature of the BinarySearch() method is that if the element is not found, a negative integer is returned. The bitwise complement (~) of this value is the index of the next element larger than the element being sought, or the total element count if there is no greater value. This provides a convenient means to insert new values into the list at the specific location so as to maintain sorting. Listing 17.3 provides an example.

Listing 17.3: Using the Bitwise Complement of the BinarySearch() Result
1. using System;
2. using System.Collections.Generic;
3.  
4. public class Program
5. {
6.     public static void Main()
7.     {
8.         List<string> list = new();
9.         int search;
10.  
11.         list.Add("public");
12.         list.Add("protected");
13.         list.Add("private");
14.  
15.         list.Sort();
16.  
17.         search = list.BinarySearch("protected internal");
18.         if(search < 0)
19.         {
20.             list.Insert(~search, "protected internal");
21.         }
22.  
23.         foreach(string accessModifier in list)
24.         {
25.             Console.WriteLine(accessModifier);
26.         }
27.     }
28. }

Beware that if the list is not first sorted, this code will not necessarily find an element, even if it is in the list. The results of Listing 17.3 appear in Output 17.2.

Output 17.2
private
protected
protected internal
public
AdVanced Topic
Finding Multiple Items with FindAll()

Sometimes you must find multiple items within a list, and your search criteria are more complex than merely looking for specific values. To support this scenario, System.Collections.Generic.List<T> includes a FindAll() method. FindAll() takes a parameter of type Predicate<T>, which is a delegate (a reference to a method). Listing 17.4 demonstrates how to use the FindAll() method.

Listing 17.4: Demonstrating FindAll() and Its Predicate Parameter
1. using System;
2. using System.Collections.Generic;
3.  
4. public class Program
5. {
6.     public static void Main()
7.     {
8.         List<int> list = new();
9.         list.Add(1);
10.         list.Add(2);
11.         list.Add(3);
12.         list.Add(2);
13.  
14.         List<int> results = list.FindAll(Even);
15.  
16.         foreach (int number in results)
17.         {
18.             Console.WriteLine(number);
19.         }
20.     }
21.     public static bool Even(int value) =>
22.         (value % 2) == 0;
23. }

In Listing 17.4’s call to FindAll(), you pass a delegate instance, Even(). This method returns true when the integer argument value is even. FindAll() takes the delegate instance and calls into Even() for each item within the list.2 Each time the return value is true, it adds it to a new List<T> instance and then returns this instance once it has checked each item within list. A complete discussion of delegates occurs in Chapter 13.

Dictionary Collections: Dictionary<TKey, TValue>

Another category of collection classes is the dictionary classes—specifically, Dictionary<TKey, TValue> (see Figure 17.3). Unlike the list collections, dictionary classes store name/value pairs. The name functions as a unique key that can be used to look up the corresponding element in a manner similar to that of using a primary key to access a record in a database. This adds some complexity to the access of dictionary elements, but because lookups by key are efficient operations, this is a useful collection. Note that the key may be any data type, not just a string or a numeric value.

Figure 17.3: Dictionary class diagrams

One option for inserting elements into a dictionary is to use the Add() method, passing both the key and the value as arguments, as shown in Listing 17.5.

Listing 17.5: Adding Items to a Dictionary<TKey, TValue>
1. using System;
2. using System.Collections.Generic;
3.  
4. public class Program
5. {
6.     public static void Main()
7.     {
8.         var colorMap = new Dictionary<string, ConsoleColor>
9.         {
10.             ["Error"] = ConsoleColor.Red,
11.             ["Warning"] = ConsoleColor.Yellow,
12.             ["Information"] = ConsoleColor.Green
13.         };
14.         colorMap.Add("Verbose", ConsoleColor.White);
15.         // ...
16.     }
17. }

After initializing the dictionary with a dictionary initializer3 (see the section “Collection Initializers” in Chapter 15), Listing 17.5 inserts the string a ConsoleColor of White for the key of "Verbose". If an element with the same key has already been added, an exception is thrown.

An alternative for adding elements is to use the indexer, as shown in Listing 17.6.

Listing 17.6: Inserting Items in a Dictionary<TKey, TValue> Using the Index Operator
1. using System;
2. using System.Collections.Generic;
3.  
4. public class Program
5. {
6.     public static void Main()
7.     {
8.         var colorMap = new Dictionary<string, ConsoleColor>
9.             {
10.                 ["Error"] = ConsoleColor.Red,
11.                 ["Warning"] = ConsoleColor.Yellow,
12.                 ["Information"] = ConsoleColor.Green
13.             };
14.  
15.         colorMap["Verbose"] = ConsoleColor.White;
16.         colorMap["Error"] = ConsoleColor.Cyan;
17.  
18.         // ...
19.     }
20. }

The first thing to observe in Listing 17.6 is that the index operator does not require an integer. Instead, the index operand type is specified by the first type argument (string), and the type of the value that is set or retrieved by the indexer is specified by the second type argument (ConsoleColor).

The second thing to notice in Listing 17.6 is that the same key ("Error") is used twice. In the first assignment, no dictionary value corresponds to the given key. When this happens, the dictionary collection classes insert a new value with the supplied key. In the second assignment, an element with the specified key already exists. Instead of inserting an additional element, the prior ConsoleColor value for the "Error" key is replaced with ConsoleColor.Cyan.

Attempting to read a value from a dictionary with a nonexistent key throws a KeyNotFoundException. The ContainsKey() method allows you to check whether a particular key is used before accessing its value, thereby avoiding the exception.

The Dictionary<TKey, TValue> is implemented as a hash table; this data structure provides extremely fast access when searching by key, regardless of the number of values stored in the dictionary. By contrast, checking whether there is a particular value in the dictionary collections is a time-consuming operation with linear performance characteristics, much like searching an unsorted list. To do this, you use the ContainsValue() method, which searches sequentially through each element in the collection.

You remove a dictionary element using the Remove() method, passing the key, not the element value, as the argument.

Because both the key and the value are required to add a value to the dictionary, the loop variable of a foreach loop that enumerates elements of a dictionary must be KeyValuePair<TKey, TValue>. Listing 17.7 shows a snippet of code demonstrating the use of a foreach loop to enumerate the keys and values in a dictionary. The output appears in Output 17.3.

Listing 17.7: Iterating over Dictionary<TKey, TValue> with foreach
1. using System;
2. using System.Collections.Generic;
3.  
4. public class Program
5. {
6.     public static void Main()
7.     {
8.         var colorMap = new Dictionary<string, ConsoleColor>
9.             {
10.                 ["Error"] = ConsoleColor.Red,
11.                 ["Warning"] = ConsoleColor.Yellow,
12.                 ["Information"] = ConsoleColor.Green,
13.                 ["Verbose"] = ConsoleColor.White
14.             };
15.  
16.         Print(colorMap);
17.   }
18.  
19.     private static void Print(
20.       IEnumerable<KeyValuePair<string, ConsoleColor>> items)
21.     {
22.         foreach (KeyValuePair<string, ConsoleColor> item in items)
23.         {
24.             Console.ForegroundColor = item.Value;
25.           Console.WriteLine(item.Key);
26.       }
27.     }
28. }
Output 17.3
Error
Warning
Information
Verbose

Note that the order of the items shown here is the order in which the items were added to the dictionary, just as if they had been added to a list. Implementations of dictionaries will often enumerate the keys and values in the order in which they were added to the dictionary, but this feature is neither required nor documented, so you should not rely on it.

Guidelines
DO NOT make any unwarranted assumptions about the order in which elements of a collection will be enumerated. If the collection is not documented as enumerating its elements in a particular order, it is not guaranteed to produce elements in any particular order.

If you want to deal only with keys or only with elements within a dictionary class, they are available via the Keys and Values properties, respectively. The data type returned from these properties is of type ICollection<T>. The data returned by these properties is a reference to the data within the original dictionary collection rather than a copy; changes within the dictionary are automatically reflected in the collection returned by the Keys and Values properties.

AdVanced Topic
Customizing Dictionary Equality

To determine whether a given key matches any existing key in the dictionary, the dictionary must be able to compare two keys for equality. This is analogous to the way that lists must be able to compare two items to determine their order. (For an example, see “Advanced Topic: Customizing Collection Sorting” earlier in this chapter.) By default, two instances of a value type are compared by checking whether they contain exactly the same data, and two instances of a reference type are compared to see whether both reference the same object. However, it is occasionally necessary to be able to compare two instances as equal even if they are not exactly the same value or exactly the same reference.

For example, suppose you wish to create a Dictionary<Contact, string> using the Contact type from Listing 17.2. However, you want any two Contact objects to compare as equal if they have the same first and last names, regardless of whether the two objects are reference equal. Much as you can provide an implementation of IComparer<T> to sort a list, so you can provide an implementation of IEqualityComparer<T> to determine if two keys are to be considered equal. This interface requires two methods: one that returns whether two items are equal and one that returns a “hash code” that the dictionary can use to facilitate fast indexing. Listing 17.8 shows an example.

Listing 17.8: Implementing IEqualityComparer<T>
1. using System.Collections.Generic;
2.  
3. public class ContactEquality : IEqualityComparer<Contact>
4. {
5.     public bool Equals(Contact? x, Contact? y)
6.     {
7.         if(object.ReferenceEquals(x, y))
8.             return true;
9.         if(x is null || y is null)
10.             return false;
11.         return x.LastName == y.LastName &&
12.             x.FirstName == y.FirstName;
13.     }
14.  
15.     public int GetHashCode(Contact x)
16.     {
17.         if(x is null)
18.             return 0;
19.  
20.         int h1 = x.FirstName is null ? 0 : x.FirstName.GetHashCode();
21.         int h2 = x.LastName is null ? 0 : x.LastName.GetHashCode();
22.         return h1 * 23 + h2;
23.     }
24. }

To create a dictionary that uses this equality comparer, you can use the constructor new Dictionary<Contact, string>(new ContactEquality).

Beginner Topic
Requirements of Equality Comparisons

As discussed in Chapter 10, several important rules apply to the equality and hash code algorithms. Conformance to these rules is critical in the context of collections. Just as correctly sorting a list requires a custom ordering comparison to provide a total order, so, too, does a hash table require certain guarantees to be met by a custom equality comparison. The most important requirement is that if Equals() returns true for two objects, GetHashCode() must return the same value for those two objects. Note that the converse is not true: Two unequal items may have the same hash code. (Indeed, there must be two unequal items that have the same hash code because there are only 232 possible hash codes but many more than that number of unequal objects!)

The second-most important requirement is that two calls to GetHashCode() on the same item must produce the same result for at least as long as the item is in the hash table. Note, however, that two objects that “look equal” are not required to give the same hash code in two separate runs of a program. For example, it is perfectly legal for a given contact to be assigned one hash code today and, two weeks later when you run the program a second time, for “the same” contact to be given a different hash code. Do not persist hash codes into a database and expect them to remain stable across different runs of a program.

Ideally, the result of GetHashCode() should appear to be random. That is, small changes to the input should cause large changes to the output, and the result should be distributed roughly evenly across all possible integer values. It is difficult, however, to devise a hash algorithm that is extremely fast and produces extremely well-distributed output; try to find a good middle ground.

Finally, GetHashCode() and Equals() must not throw exceptions. Notice how the code in Listing 17.8 is careful to never dereference a null reference, for example.

To summarize, here are the key principles:

Equal objects must have equal hash codes.
The hash code of an object should not change for the life of the instance (at least while it is in the hash table).
The hashing algorithm should quickly produce a well-distributed hash.

The hashing algorithm should avoid throwing exceptions in all possible object states.

Sorted Collections: SortedDictionary<TKey, TValue> and SortedList<T>

The sorted collection classes (see Figure 17.4) store their elements sorted by key for SortedDictionary<TKey, TValue> and by value for SortedList<T>. If we change the code in Listing 17.7 to use a SortedDictionary<string, string> instead of a Dictionary<string, string>, the output of the program is as appears in Output 17.4.