C# How Parallel.ForEach / Parallel.For partitioning works

I have some basic questions about Parallel.ForEach with partition approach and I'm facing some problems with them so I'd like to understand how this code works and what is the flow of it.

Code sample

var result = new StringBuilder();
Parallel.ForEach(Enumerable.Range(1, 5), () => new StringBuilder(), (x, option, sb) =>
{
    sb.Append(x);
    return sb;
}, sb =>
{
    lock (result)
    {
        result.Append(sb.ToString());
    }
});

Questions related to the code above:

Are they doing some partition work inside parallel foreach ?
When I debug the code, I can see that the iteration (execution) of the code happens more then 5 times, but as I understand it is supposed to fire only 5 times - Enumerable.Range(1, 5) .
When will be this code fired ? In both Parallel.Foreach and Parallel.For there are two blocks separated by {}. How these two blocks are executing and interact with each other?

    lock (result)
    {
       result.Append(sb.ToString());
    }

Bonus Q:

See this this block of code where 5 iteration not occurring rather more iteration is taking place. when i use Parallel For instead of foreach. see the code and tell me where i made the mistake.

    var result = new StringBuilder();
    Parallel.For(1, 5, () => new StringBuilder(), (x, option, sb) =>
    {
        sb.Append("line " + x + System.Environment.NewLine);
        MessageBox.Show("aaa"+x.ToString());
        return sb;
        
    }, sb =>
    {
        lock (result)
        {
            result.Append(sb.ToString());
        }
    });

Solution

There are several misunderstands regarding how Parallel.XYZ works.

Couple of great points and suggestions have been mentioned in the comments, so I won't repeat them. Rather I would like to share some thoughts about Parallel programming.

The Parallel Class

Whenever we are talking about parallel programming we are usually distinguishing two kinds: Data parallelism and Task parallelism. The former is executing the same function(s) over a chunk of data in parallel. The latter is executing several independent functions in parallel.

(There is also a 3rd model called pipeline which is kind a mixture of these two. I won't spend time on it if you are interested about that one I would suggest to search for Task Parallel Library's Dataflow or System.Threading.Channels.)

The Parallel class supports both of the models. The For and ForEach are designed for data parallelism, while the Invoke for task parallelism.

Partitioning

In case data parallelism the tricky part is how do you slice your data to get the best throughput / performance. You have to put into the account the size of the data collection, the structure of the data, the processing logic and the available cores (and many more other aspects as well). So there is no one-rule-for-all suggestion.

The main concern about partitioning is to not under-use the resources (some cores are idle, while others are working hard) and neither over-use (there are way more waiting jobs than available cores, so the synchronization overhead can be significant).

Let's suppose your processing logic is firmly stable (in other words various input data will not change significantly the processing time). In this case you can load balance the data between the executors. If an executor finishes then it can grab the new piece of data to be processed.

The way how you choose which data should go to which executor can be defined by the Partitioner(1). By default .NET support Range, Chunk, Hash and Striped partitioning. Some are static (the partitioning is done before any processing) and some of them are dynamic (depending on the processing speed some executor might receive more than other ones).

The following two excellent articles can give you better insight how each of the partitioning works:

Thread Safety

If each of the executor can execute its processing task without the need to interact with others than they are considered independent. If you can design your algorithm to have independent processing units then you minimize the synchronization.

In case of For and ForEach each partition can have its own partition-local-storage. That means the computations are independent because the intermediate results are stored in a partition aware storage. But as usual you want to merge these into a single collection or even into value.

That's the reason why these Parallel methods have body and localFinally parameters. The former is used to define the individual processing, while the latter is the aggregate and merge function. (It is kinda similar to the Map-Reduce approach) In the latter you have aware of thread safety by yourself.

PLINQ

I don't want to explore this topic, which outside of the scope of the question. But I would like to give you a notch where to get started:

Useful resources

EDIT: How to decide that it's worth to run in parallel?

There is no single formula (at least to my knowledge) which will tell you when does it make sense to use parallel execution. As I tried to highlight in the Partitioning section is a quite complex topic, so several experiments and fine-tuning are needed to find the optimal solution.

I highly encourage you to measure and try several different settings.

Here is my guideline how you should tackle this:

Try to understand the current characteristics of your application
Perform several different measurements to spot the execution bottleneck
Capture the current solution's performance metrics as your baseline
If it possible try to extract that piece of code from the code base to ease the fine-tuning
Try to tackle the same problem with several different aspects and with various inputs
Measure them and compare them to your baseline
If you are satisfied with the result then put that piece of code into your code base and measure again under different workloads
Try to capture as many relevant metrics as you can
If it is possible consider to execute both (sequential and parallel) solutions and compare their results.
If you are satisfied then get rid of the sequential code

Details

There several really good tools that can help you to get insight about your application. For .NET Profiling I would encourage you to give it try to CodeTrack. Concurrency Visualizer is also good tool if no need custom metrics.
By several measurements I meant that you should measure several times with several different tools to exclude special circumstances. If you measure only once then you can get false positive result. So, measure twice, cut once.
Your sequential processing should serve as a baseline. Base over-parallelization can cause certain overhead that's why it make sense to be able to compare your new shine solution with current one. Under utilization can also cause significant performance degradation.
If you can extract your problematic code than you can perform micro-benchmarks. I encourage you to take a look at the awesome Benckmark.NET tool to create benchmarks.
The same problem can be solved many in ways. So try to find several different approaches (like Parallel, PLINQ can be used more or less for the same problems)
As I said earlier measure, measure and measure. You should also keep in mind .NET try to be smart. What I mean by that for example AsParallel does not give you a guarantee that it will run in parallel. .NET analysis your solution and data structure and decide how to run it. On the other hand you can enforce parallel execution if you are certain that it will help.

There are libraries like Scientist.NET which can help you to perform this short of parallel run and compare process.

Enjoy :D