I recently wrote some code that was using Parallel.ForEach() and in the function called during the Parallel.ForEach() I nested another call to Parallel.ForEach() to process a second array. This then got me thinking. If the first array that you are processing is larger than the number of cores on the machine that you're running it on then there's no advantage to nesting a second Parallel.ForEach(). You might as well just loop in that function because we'll already have all the processors saturated. So I put together a test to see the difference between Parallel.ForEach() with a regular loop and Parallel.ForEach() with a nested Parallel.ForEach()
static void ParallelMainNestedTest()
{
int[] lowerRange = Enumerable.Range(0, 10).ToArray();
int[] upperRange = Enumerable.Range(1000, 1010).ToArray();
Stopwatch timer = Stopwatch.StartNew();
Parallel.ForEach(lowerRange, lowerValue =>
DoItemNotNested(lowerValue, upperRange));
Console.WriteLine("Not Nested: {0}", timer.Elapsed);
timer.Restart();
Parallel.ForEach(lowerRange, lowerValue =>
DoItemNested1(lowerValue, upperRange));
Console.WriteLine("Nested: {0}", timer.Elapsed);
Console.ReadKey();
}
static void DoItemNotNested(int lowerValue, int[] upperRange)
{
foreach (int upperValue in upperRange)
{
Thread.Sleep(10);
}
}
static void DoItemNested1(int lowerValue, int[] upperRange)
{
Parallel.ForEach(upperRange, upperValue =>
DoItemNested2(lowerValue, upperValue));
}
static void DoItemNested2(int lowerValue, int upperValue)
{
Thread.Sleep(10);
}
The results of this test showed:
Not Nested: 00:00:14.6155628
Nested: 00:00:09.0907248
Now the reason that I put a sleep in the inner loop of the two tests is because this test code is designed to simulate network access (a REST call to a site) and I wanted to create a test where there would be idle wait time to determine which strategy is better. I appears, as you can see, that a pair of nested Parallel.ForEach() calls is most efficient in this scenario.
What if there was no latency in the inner loop and it was all down to the speed of the processor?
My suspicion is that the nested Parallel.ForEach() would be slower because of the extra overhead in managing the extra threads. To test this I replaced the Thread.Sleep(10) call with the following:
static Random r = new Random(Guid.NewGuid().GetHashCode());
private static void DoSomething(int lowerValue, int upperValue)
{
for (int i = 0; i < 5000; i++)
{
int rand = r.Next(lowerValue, upperValue);
rand = rand * rand;
}
}
The result:
Not Nested: 00:00:05.6169188
Nested: 00:00:06.1500767
Confirmed: The nested Parallel.ForEach() shows a degradation in performance because of the extra overhead in managing all those extra threads.
Conclusion: Used nested Parallel.ForEach()'s when you know that the inner loop will be waiting on external resources (database, disk, network etc.). Use regular looping in the inner loop if you are doing a math or algorithmic intensive calculation that relies on the CPU alone.