Five key categories of collection classes exist, and they differ from one another in terms of how data is inserted, stored, and retrieved. Each generic class is located in the System.Collections.Generic namespace, and their nongeneric equivalents are found in the System.Collections namespace.
The List<T> class has properties similar to an array. The key difference is that these classes automatically expand as the number of elements increases. (In contrast, an array size is constant.) Furthermore, lists can shrink via explicit calls to TrimToSize() or Capacity (see Figure 17.2).
These classes are categorized as list collections whose distinguishing functionality is that each element can be individually accessed by index, just like an array. Therefore, you can set and access elements in the list collection classes using the index operator, where the index parameter value corresponds to the position of an element in the collection. Listing 17.1 shows an example, and Output 17.1 shows the results.
C# is zero-index based; therefore, index 0 in Listing 17.1 corresponds to the first element and index 6 indicates the seventh element. Retrieving elements by index does not involve a search. Rather, it entails a quick and simple “jump” operation to a location in memory.
A List<T> is an ordered collection; the Add() method appends the given item to the end of the list. Before the call to Sort() in Listing 17.1, "Sneezy" is first and "Grumpy" is last; after the call, the list is sorted into alphabetical order rather than the order in which items were added. Some collections automatically sort elements as they are added, but List<T> is not one of them; an explicit call to Sort() is required for the elements to be sorted.
To remove an element, you use the Remove() or RemoveAt() method to either remove a given element or remove whatever element is at a particular index, respectively.
You might have wondered how the List<T>.Sort() method in Listing 17.1 knew how to sort the elements of the list into alphabetical order. The string type implements the IComparable<string> interface, which has one method, CompareTo(). It returns an integer indicating whether the element passed is greater than, less than, or equal to the current element. If the element type implements the generic IComparable<T> interface (or the nongeneric IComparable interface), the sorting algorithm will, by default, use it to determine the sorted order.
But what if either the element type does not implement IComparable<T> or the default logic for comparing two things does not meet your needs? To specify a nondefault sort order, you can call the overload of List<T>.Sort(), which takes IComparer<T> as an argument.
The difference between IComparable<T> and IComparer<T> is subtle but important. The first interface means, “I know how to compare myself to another instance of my type.” The latter means, “I know how to compare two things of a given type.”
The IComparer<T> interface is typically used when there are many different possible ways of sorting a data type and none is obviously the best. For example, you might have a collection of Contact objects that you sometimes want to sort by name, by location, by birthday, by geographic region, or by any number of other possibilities. Rather than choosing a sorting strategy and making the Contact class implement IComparable<Contact>, it might be wiser to create several different classes that implement IComparer<Contact>. Listing 17.2 shows a sample implementation of a LastName, FirstName comparison.
To sort a List<Contact> by last name and then first name, you can call contactList.Sort(new NameComparer()).
You are required to produce a total order when implementing IComparable<T> or IComparer<T>. Your implementation of CompareTo must provide a fully consistent ordering for any possible pair of items. This ordering is required to have a number of basic characteristics. For example, every element is required to be considered equal to itself. If an element X is considered to be equal to element Y, and element Y is considered to be equal to element Z, then all three elements X, Y, and Z must be considered equal to one another. If an element X is considered to be greater than Y, Y must be considered to be less than X. And there must be no “transitivity paradoxes”—that is, you cannot have X greater than Y, Y greater than Z, and Z greater than X. If you fail to provide a total ordering, the action of the sort algorithm is undefined; it may produce a crazy ordering, it may crash, it may go into an infinite loop, and so on.
Notice, for example, how the comparer in Listing 17.2 ensures a total order, even if the arguments are null references. It would not be legal to say, “If either element is null, then return zero,” for example, because then two non-null things could be equal to null but not equal to each other.
To search List<T> for a particular element, you use the Contains(), IndexOf(), LastIndexOf(), and BinarySearch() methods. The first three methods search through the array, starting at the first element (or the last element for LastIndexOf()), and examine each element until the desired one is found. The execution time for these algorithms is proportional to the number of elements searched before a hit occurs. (Be aware that the collection classes do not require that all the elements within the collection are unique. If two or more elements in the collection are the same, IndexOf() returns the first index and LastIndexOf() returns the last index.)
BinarySearch() uses a much faster binary search algorithm but requires that the elements be sorted. A useful feature of the BinarySearch() method is that if the element is not found, a negative integer is returned. The bitwise complement (~) of this value is the index of the next element larger than the element being sought, or the total element count if there is no greater value. This provides a convenient means to insert new values into the list at the specific location so as to maintain sorting. Listing 17.3 provides an example.
Beware that if the list is not first sorted, this code will not necessarily find an element, even if it is in the list. The results of Listing 17.3 appear in Output 17.2.
Sometimes you must find multiple items within a list, and your search criteria are more complex than merely looking for specific values. To support this scenario, System.Collections.Generic.List<T> includes a FindAll() method. FindAll() takes a parameter of type Predicate<T>, which is a delegate (a reference to a method). Listing 17.4 demonstrates how to use the FindAll() method.
In Listing 17.4’s call to FindAll(), you pass a delegate instance, Even(). This method returns true when the integer argument value is even. FindAll() takes the delegate instance and calls into Even() for each item within the list.2 Each time the return value is true, it adds it to a new List<T> instance and then returns this instance once it has checked each item within list. A complete discussion of delegates occurs in Chapter 13.
Another category of collection classes is the dictionary classes—specifically, Dictionary<TKey, TValue> (see Figure 17.3). Unlike the list collections, dictionary classes store name/value pairs. The name functions as a unique key that can be used to look up the corresponding element in a manner similar to that of using a primary key to access a record in a database. This adds some complexity to the access of dictionary elements, but because lookups by key are efficient operations, this is a useful collection. Note that the key may be any data type, not just a string or a numeric value.
One option for inserting elements into a dictionary is to use the Add() method, passing both the key and the value as arguments, as shown in Listing 17.5.
After initializing the dictionary with a dictionary initializer3 (see the section “Collection Initializers” in Chapter 15), Listing 17.5 inserts the string a ConsoleColor of White for the key of "Verbose". If an element with the same key has already been added, an exception is thrown.
An alternative for adding elements is to use the indexer, as shown in Listing 17.6.
The first thing to observe in Listing 17.6 is that the index operator does not require an integer. Instead, the index operand type is specified by the first type argument (string), and the type of the value that is set or retrieved by the indexer is specified by the second type argument (ConsoleColor).
The second thing to notice in Listing 17.6 is that the same key ("Error") is used twice. In the first assignment, no dictionary value corresponds to the given key. When this happens, the dictionary collection classes insert a new value with the supplied key. In the second assignment, an element with the specified key already exists. Instead of inserting an additional element, the prior ConsoleColor value for the "Error" key is replaced with ConsoleColor.Cyan.
Attempting to read a value from a dictionary with a nonexistent key throws a KeyNotFoundException. The ContainsKey() method allows you to check whether a particular key is used before accessing its value, thereby avoiding the exception.
The Dictionary<TKey, TValue> is implemented as a hash table; this data structure provides extremely fast access when searching by key, regardless of the number of values stored in the dictionary. By contrast, checking whether there is a particular value in the dictionary collections is a time-consuming operation with linear performance characteristics, much like searching an unsorted list. To do this, you use the ContainsValue() method, which searches sequentially through each element in the collection.
You remove a dictionary element using the Remove() method, passing the key, not the element value, as the argument.
Because both the key and the value are required to add a value to the dictionary, the loop variable of a foreach loop that enumerates elements of a dictionary must be KeyValuePair<TKey, TValue>. Listing 17.7 shows a snippet of code demonstrating the use of a foreach loop to enumerate the keys and values in a dictionary. The output appears in Output 17.3.
Note that the order of the items shown here is the order in which the items were added to the dictionary, just as if they had been added to a list. Implementations of dictionaries will often enumerate the keys and values in the order in which they were added to the dictionary, but this feature is neither required nor documented, so you should not rely on it.
If you want to deal only with keys or only with elements within a dictionary class, they are available via the Keys and Values properties, respectively. The data type returned from these properties is of type ICollection<T>. The data returned by these properties is a reference to the data within the original dictionary collection rather than a copy; changes within the dictionary are automatically reflected in the collection returned by the Keys and Values properties.
To determine whether a given key matches any existing key in the dictionary, the dictionary must be able to compare two keys for equality. This is analogous to the way that lists must be able to compare two items to determine their order. (For an example, see “Advanced Topic: Customizing Collection Sorting” earlier in this chapter.) By default, two instances of a value type are compared by checking whether they contain exactly the same data, and two instances of a reference type are compared to see whether both reference the same object. However, it is occasionally necessary to be able to compare two instances as equal even if they are not exactly the same value or exactly the same reference.
For example, suppose you wish to create a Dictionary<Contact, string> using the Contact type from Listing 17.2. However, you want any two Contact objects to compare as equal if they have the same first and last names, regardless of whether the two objects are reference equal. Much as you can provide an implementation of IComparer<T> to sort a list, so you can provide an implementation of IEqualityComparer<T> to determine if two keys are to be considered equal. This interface requires two methods: one that returns whether two items are equal and one that returns a “hash code” that the dictionary can use to facilitate fast indexing. Listing 17.8 shows an example.
To create a dictionary that uses this equality comparer, you can use the constructor new Dictionary<Contact, string>(new ContactEquality).
As discussed in Chapter 10, several important rules apply to the equality and hash code algorithms. Conformance to these rules is critical in the context of collections. Just as correctly sorting a list requires a custom ordering comparison to provide a total order, so, too, does a hash table require certain guarantees to be met by a custom equality comparison. The most important requirement is that if Equals() returns true for two objects, GetHashCode() must return the same value for those two objects. Note that the converse is not true: Two unequal items may have the same hash code. (Indeed, there must be two unequal items that have the same hash code because there are only 232 possible hash codes but many more than that number of unequal objects!)
The second-most important requirement is that two calls to GetHashCode() on the same item must produce the same result for at least as long as the item is in the hash table. Note, however, that two objects that “look equal” are not required to give the same hash code in two separate runs of a program. For example, it is perfectly legal for a given contact to be assigned one hash code today and, two weeks later when you run the program a second time, for “the same” contact to be given a different hash code. Do not persist hash codes into a database and expect them to remain stable across different runs of a program.
Ideally, the result of GetHashCode() should appear to be random. That is, small changes to the input should cause large changes to the output, and the result should be distributed roughly evenly across all possible integer values. It is difficult, however, to devise a hash algorithm that is extremely fast and produces extremely well-distributed output; try to find a good middle ground.
Finally, GetHashCode() and Equals() must not throw exceptions. Notice how the code in Listing 17.8 is careful to never dereference a null reference, for example.
To summarize, here are the key principles:
The hashing algorithm should avoid throwing exceptions in all possible object states.
The sorted collection classes (see Figure 17.4) store their elements sorted by key for SortedDictionary<TKey, TValue> and by value for SortedList<T>. If we change the code in Listing 17.7 to use a SortedDictionary<string, string> instead of a Dictionary<string, string>, the output of the program is as appears in Output 17.4.
Note that the elements are sorted into order by key, not by value.
Because sorted collections must do extra work to maintain the sorted order of their elements, insertion and removal are typically slightly slower than insertion and removal of values in an unordered dictionary.
Because sorted collections must store their items in a particular order, it is possible to access values both by key and by index. To access a key or value by its index in the sorted list, use the Keys and Values properties. They return IList<TKey> and IList<TValue> instances, respectively; the resultant collection can be indexed like any other list.
Chapter 12 discussed the stack collection classes (see Figure 17.5). The stack collection classes are designed as last in, first out (LIFO) collections. The two key methods are Push() and Pop():
To access the elements on the stack without modifying the stack, you use the Peek() and Contains() methods. The Peek() method returns the next element that Pop() will retrieve.
As with most collection classes, you use the Contains() method to determine whether an element exists anywhere in the stack. As with all collections, it is also possible to use a foreach loop to iterate over the elements in a stack. This allows you to access values from anywhere in the stack. Note, however, that accessing a value via the foreach loop does not remove it from the stack—only Pop() provides this functionality.
Queue collection classes, shown in Figure 17.6, are identical to stack collection classes, except that they follow the ordering pattern of first in, first out (FIFO). Instead of the Pop() and Push() methods, they use the Enqueue() and Dequeue() methods, respectively. The queue collection behaves like a pipe: You place objects into the queue at one end using the Enqueue() method and remove them from the other end using the Dequeue() method. As with stack collection classes, the objects do not have to be unique, and queue collection classes automatically increase in size as required. As a queue shrinks, it does not necessarily reclaim the storage space previously used, because that would make inserting a new element potentially more expensive. If you happen to know that a queue will remain the same size for a long time, however, you can hint to it that you would like to reclaim storage space by using the TrimToSize() method.
System.Collections.Generic also supports a linked list collection that enables both forward and reverse traversal. Figure 17.7 shows the class diagram. (There is no corresponding nongeneric type.)
Another collection type worth reviewing is Span<T>. It’s a special collection type that is used for accessing a sequence of elements from an existing array without reallocating all the memory required for each element in the array. For example, given an int[] instance, you can create a Span<int> instance referring to a slice of the original array without copying each of the elements into the Span<int> collection. This makes Span<int> an efficient way of working with slices of existing collections. The memory consumed by the slice essentially overlaps much of the memory of the original array—the Span<T> refers to the elements in the original collection without copying them to a new collection. Listing 17.9 provides an example and demonstrates how Span<T> references the same data pointed to by the original array.
To demonstrate the behavior of shared memory, review how we can assign a new value to languages[0] and the corresponding element in languagesSpan will also update, and vice versa. Furthermore, this behavior applies whether using a collection of reference types or value types.
There is also a ReadOnlySpan<T>, which allows using the same construct on an immutable array such as a string, rather than allocating an entirely new string. With ReadOnlySpan<T> we can have a new string that points to a slice of the old string, a significantly more performance-based approach when a slice of the original array is all that is needed.
Internally, Span<T> uses a ref struct (added in C# 7.2). That means it can only live on the stack but never on the heap. As a result, it can be used as a local variable but not as a static or instance member on a class or normal struct. For similar functionality but without the same restriction, use a Memory<T> type.
________________________________________