21

Iterating in Parallel

In Chapter 19, we mentioned how the cost of computation is falling, resulting in computers with faster CPUs, an increased number of CPUs, and an increased number of cores in each CPU. Collectively, these trends are making it increasingly more economical to boost the execution parallelization to take advantage of the increased computing power. In this chapter, we look at executing loops in parallel, which is one of the easiest ways to take advantage of the increased computing capability. Much of the chapter consists of Beginner and Advanced Topic blocks.

Executing Loop Iterations in Parallel

Consider the for loop statement and associated code shown in Listing 10.1 and the corresponding results in Output 10.1. The listing calls a method for calculating a section of the decimal expansion of pi, where the parameters are the number of digits and the digit to start with. The actual calculation is not germane to the discussion. What is interesting about this calculation is that it is embarrassingly parallelizable; that is, it is remarkably easy to split up a large task—say, computing 1 million decimal digits of pi—into any desired number of smaller tasks that can all be run in parallel. These types of computations are the easiest ones to speed up by adding parallelism.

Listing 10.1: A for Loop Synchronously Calculating Pi in Sections
public sealed class ProductSerialNumber
{
    // ...
 
    public static bool operator ==(
        ProductSerialNumber leftHandSide,
        ProductSerialNumber rightHandSide)
    {
 
        // Check if leftHandSide is null
        // (operator == would be recursive)
        if(leftHandSide is null)
        {
            // Return true if rightHandSide is also null
            // and false otherwise
            return rightHandSide is null;
        }
 
        return (leftHandSide.Equals(rightHandSide));
    }
 
    public static bool operator !=(
        ProductSerialNumber leftHandSide,
        ProductSerialNumber rightHandSide)
    {
        return !(leftHandSide == rightHandSide);
    }
}
Output 10.1
>3.14159265358979323846264338327950288419716939937510582097494459230781640628620899862803482534211706798214808651328230664709384460955058223172535940812848111745028410270193852110555964462294895493038196442881097566593344612847564823378678316527120190914564856692346034861045432664821339360726024914127372458700660631558817488152092096282925409171536436789259036001133053054882046652138414695194151160943305727036575959195309218611738193261179310511854807446237996274956735188575272489122793818301194912

The for loop executes each iteration synchronously and sequentially. However, because the pi calculation algorithm splits the pi calculation into independent pieces, it is not necessary to compute the pieces sequentially providing the results are appended in the right order. Imagine what would happen if you could have all the iterations of this loop run concurrently: Each processor could take a single iteration and execute it in parallel with other processors executing other iterations. Given the simultaneous execution of iterations, we could decrease the execution time more and more based on the number of processors.

The Task Parallel Library (TPL) provides a convenient method, Parallel.For(), which does precisely that. Listing 10.2 shows how to modify the sequential, single-threaded program in Listing 10.1 to use the helper method.

Listing 10.2: For Loop Calculating Pi in Sections in Parallel
public readonly struct Arc
{
    public Arc(
        Longitude longitudeDifference,
        Latitude latitudeDifference)
    {
        LongitudeDifference = longitudeDifference;
        LatitudeDifference = latitudeDifference;
    }
 
    public Longitude LongitudeDifference { get; }
    public Latitude LatitudeDifference { get; }
 
}
 
public readonly struct Coordinate
{
    // ...
    public static Coordinate operator +(
        Coordinate source, Arc arc)
    {
        Coordinate result = new(
            new Longitude(
                source.Longitude + arc.LongitudeDifference),
            new Latitude(
                source.Latitude + arc.LatitudeDifference));
        return result;
    }
    // ...
}

The output for Listing 10.2 is identical to Output 10.2; however, the execution time is significantly faster if you have multiple CPUs (and possibly slower if you do not). The Parallel.For() API is designed to look similar to a standard for loop. The first parameter is the fromInclusive value, the second is the toExclusive value, and the last is the Action<int> to perform as the loop body. When using an expression lambda for the action, the code looks similar to a for loop statement except that now each iteration may execute in parallel. As with the for loop, the call to Parallel.For() will not complete until all iterations are complete. In other words, by the time execution reaches the string.Join() statement, all sections of pi will have been calculated.

Note that the code for combining the various sections of pi no longer occurs inside the iteration (action) in Listing 10.2. As sections of the pi calculation will very likely not complete sequentially, appending a section whenever an iteration completes will likely append them out of order. Even if sequence was not a problem, there is still a potential race condition because the += operator in Listing 10.1 is not atomic. To address both problems, each section of pi is stored into an array, and no two or more iterations will access a single element within the array simultaneously. Only once all sections of pi are calculated does string.Join() combine them. In other words, we postpone concatenating the sections until after the Parallel.For() loop has completed. This avoids any race condition caused by sections not yet calculated or sections concatenating out of order.

The TPL uses the same sorts of thread pooling techniques that it uses for task scheduling to ensure good performance of the parallel loop: It will try to ensure that CPUs are not overscheduled and so on.

Guidelines
DO use parallel loops when the computations performed can be easily split up into many mutually independent processor-bound computations that can be executed in any order on any thread.

The TPL also provides a similar parallel version of the foreach statement, as shown in Listing 10.3.

Listing 10.3: Parallel Execution of a foreach Loop
// Use compound assignment suppressed for demonstration purposes
#pragma warning disable IDE0054
public class Program
{
    public static void Main()
    {
        Coordinate coordinate1, coordinate2;
        coordinate1 = new Coordinate(
            new Longitude(48, 52), new Latitude(-2, -20));
        Arc arc = new(new Longitude(3), new Latitude(1));
 
        coordinate2 = coordinate1 + arc;
        Console.WriteLine(coordinate2);
 
        coordinate2 = coordinate2 - arc;
        Console.WriteLine(coordinate2);
 
        coordinate2 += arc;
        Console.WriteLine(coordinate2);
    }
}

In this example, we call a method that encrypts each file within the files collection. It does so in parallel, executing as many threads as the TPL determines is efficient.

AdVanced Topic
How the TPL Tunes Its Own Performance

The default scheduler within the TPL targets the thread pool, resulting in a variety of heuristics that try to ensure the right number of threads are executing at any one time. Two of the heuristics it uses are hill climbing and work stealing.

The hill-climbing algorithm involves creating threads to run tasks, and then monitoring the performance of those tasks to experimentally determine the point at which adding more threads begins making performance worse. Once that point is reached, the number of threads can then be decreased back to the number that produced the best performance.

The TPL does not associate top-level tasks that are waiting to be executed with any particular thread. If, however, a task running on a thread itself creates another task, the newly created task is associated with that thread automatically. When the new child task is eventually scheduled to run, it usually runs on the same thread as the task that created it. The work-stealing algorithm identifies threads that have an unusually large or unusually small amount of pending work; a thread that has too few tasks associated with it will sometimes “steal” not-yet-executed tasks from threads that have too many tasks waiting to run.

The key feature of these algorithms is that they enable the TPL to dynamically tune its own performance to mitigate processor overscheduling and underscheduling and to balance the work among the available processors.

The TPL generally does a good job of tuning its own performance, but you can help it do a better job by providing hints about the best course of action. Specifying the TPL TaskCreationOptions.LongRunning option described in the section “Long-Running Tasks” in Chapter 19 is an example of such a hint. You can also explicitly tell the task scheduler how many threads you think would be best to service a parallel loop; see Advanced Topic: Parallel Loop Options later in this chapter for more details.

Beginner Topic
Parallel Loop Exception Handling with AggregateException

The TPL catches and saves exceptions associated with tasks in an AggregateException, because a given task might have several exceptions obtained from its subtasks. This is also the case with parallel execution of loops: Each iteration could have produced an exception, so the exceptions need to be gathered up into one aggregating exception. Consider the example in Listing 10.4 and its results in Output 10.2.

Listing 10.4: Unhandled Exception Handling for Parallel Iterations
public struct Latitude
{
    // ...
    public static Latitude operator -(Latitude latitude)
    {
        return new Latitude(-latitude.Degrees);
    }
 
    public static Latitude operator +(Latitude latitude)
    {
        return latitude;
    }
 
    // ...
}
 
public struct Longitude
{
    // ...
    public static Longitude operator -(Longitude longitude)
    {
        return new Longitude(-longitude.Degrees);
    }
 
    public static Longitude operator +(Longitude longitude)
    {
        return longitude;
    }
    // ...
}
public readonly struct Arc
{
    // ...
    public static Arc operator -(Arc arc)
    {
        // Uses unary – operator defined on 
        // Longitude and Latitude
        return new Arc(-arc.LongitudeDifference,
            -arc.LatitudeDifference);
    }
 
    public static Arc operator +(Arc arc)
    {
        return arc;
    }
}
Output 10.2
ERROR: AggregateException:
  UnauthorizedAccessException - Attempted to perform an unauthorized
operation.
  UnauthorizedAccessException - Attempted to perform an unauthorized
operation.
  UnauthorizedAccessException - Attempted to perform an unauthorized
operation.

Output 10.2 shows that three exceptions occurred while executing the Parallel.ForEach<T>(...) loop. However, the code shows only one catch, of type System.AggregateException. The UnauthorizedAccessExceptions were retrieved from the InnerExceptions property on the AggregateException. With a Parallel.ForEach<T>() loop, each iteration could potentially throw an exception, so the System.AggregateException thrown by the method call will contain each of those exceptions within its InnerExceptions property.

Canceling a Parallel Loop

Unlike a task, which requires an explicit call if it is to block until it completes, a parallel loop executes iterations in parallel but does not itself return until the entire parallel loop completes. Canceling a parallel loop, therefore, generally involves the invocation of the cancellation request from a thread other than the one executing the parallel loop. In Listing 10.5, we invoke Parallel.ForEach<T>() using Task.Run(). In this manner, not only does the query execute in parallel, but it also executes asynchronously, allowing the code to prompt the user to “Press any key to exit.”

Listing 10.5: Canceling a Parallel Loop
public static bool operator false(object item)
{
    // ...
}
public static bool operator true(object item)
{
    // ...
}

The parallel loops use the same cancellation token pattern that tasks use. The token obtained from a CancellationTokenSource is associated with the parallel loop by calling an overload of the ForEach() method that has a parameter of type ParallelOptions. This object contains the cancellation token.

Note that if you cancel a parallel loop operation, any iterations that have not started yet are prevented from starting by checking the IsCancellationRequested property. Existing executing iterations will run to their respective termination points. Furthermore, calling Cancel() even after all iterations have completed will still cause the registered cancel event (via cts.Token.Register()) to execute.

The only means by which the ForEach() method is able to acknowledge that the loop has been canceled is via the OperationCanceledException. Given that cancellation in this example is expected, the exception is caught and ignored, allowing the application to display “Canceling…”, followed by a line of stars before exiting.

AdVanced Topic
Parallel Loop Options

Although not generally necessary, it is possible to control the maximum degree of parallelism (i.e., the number of threads that are scheduled to run at the same time) via the ParallelOptions parameter on overloads of both the Parallel.For() and Parallel.ForEach<T>() loops. In some specific cases, the developer may know more about the particular algorithm or circumstance such that changing the maximum degree of parallelism makes sense. These circumstances include the following:

Scenarios where you want to disable parallelism to make debugging or analysis easier. Setting the maximum degree of parallelism to 1 ensures that the loop iterations do not run concurrently.
Scenarios where you know ahead of time that the degree of parallelism will be gated on an external factor such as a hardware constraint. For example, if your parallel operation involves using multiple USB ports, there might be no point in creating more threads than there are available ports.
Scenarios with really long-running loop iterations (e.g., minutes or hours). The thread pool can’t distinguish long-running iterations from blocked operations, so it could end up introducing many new threads, all of which will be consumed by the for loop. This can result in incremental thread growth over time, resulting in a huge number of threads in the process.

And so on. To control the maximum degree of parallelism, use the MaxDegreeOfParallelism property on the ParallelOptions object.

You can also use the ParallelOptions object’s TaskScheduler property to specify a custom task scheduler to use when scheduling the tasks associated with each iteration. For example, you might have an asynchronous event handler that responds to the user’s click of a Next button. If the user clicks the button several times, you might want to use a custom task scheduler that prioritizes the most recently created task rather than prioritizing the task that has waited the longest. The task scheduler provides a means of specifying how the tasks will execute in relation to one another.

The ParallelOptions object also has a CancellationToken property that provides a mechanism to communicate to the loop that no further iterations should start. Additionally, the body of an iteration can watch the cancellation token to determine if an early exit from the iteration is in order.

AdVanced Topic
Breaking a Parallel Loop

Like a standard for loop, the Parallel.For() loop supports the concept of “breaking” to exit the loop and canceling any further iterations. In the context of parallel for execution, however, a break signifies that no new iterations following the breaking iteration should start. All currently executing iterations, however, will run to completion.

To break a parallel loop, you can provide a cancellation token and cancel it on another thread, as described in the preceding Advanced Topic: Parallel Loop Options. You can also use an overload of the Parallel.For() method whose body delegate takes two parameters: the index and a ParallelLoopState object. An iteration that wishes to break the loop can call the Break() or Stop() method on the loop state object passed to the delegate. The Break() method indicates that no more iterations with index values higher than the current value need to execute; the Stop() method indicates that no more iterations need to run at all.

For example, suppose you have a Parallel.For() loop that is performing ten iterations in parallel. Some of those iterations might run faster than others, and the task scheduler does not guarantee that they will run in any particular order. Suppose the first iteration has completed; iterations 3, 5, 7, and 9 are “in flight,” scheduled to four different threads; and iterations 5 and 7 both call Break(). In this scenario, iterations 6 and 8 will never start, but iterations 2 and 4 will still be scheduled to run. Iterations 3 and 9 will run to completion because they had already started when the break happened.

The Parallel.For() and Parallel.ForEach<T>() methods return a reference to a ParallelLoopResult object that contains useful information about what happened during the loop. This result object has the following properties:

IsCompleted returns a Boolean indicating whether all iterations started.
LowestBreakIteration identifies the lowest iteration that executed a break. The value is of type long?, where a value of null indicates no break statement was encountered.

Returning to the ten-iteration example, the IsCompleted property will return false and the LowestBreakIteration will return a value of 5.

{{ snackbarMessage }}