Search code examples
parallel-processingf#mutable

When can I call a function using mutable variables in parallel?


After seeing an interesting lecture by Phil Trelford

https://www.youtube.com/watch?v=hx2vOwbB-X0

I was intrigued about the possibility of accelerating my code by replacing lists with arrays and more generally, of using mutable variables. So I did a simple test:

let xs = [0.0..1000000.00]
let axs = List.toArray xs

let f x = sin (x * x)

List.map f xs // Real: 00:00:00.170, CPU: 00:00:00.187, GC gen0: 5, gen1: 3, gen2: 1

Array.map f axs // Real: 00:00:00.046, CPU: 00:00:00.046, GC gen0: 0, gen1: 0, gen2: 0

Mapping through the array was more than three times faster than mapping through the list. At this point I have not yet tested the speed difference when the function called is more computationally intensive. The difference could be due just to being faster to move through the items in an array, and could become insignificant when each iteration is computationally intensive.

Still, there must be cases in which using arrays or more generally mutable variables could make a significant difference.

Before changing my code to use arrays instead of lists I would like to have a clearer picture of the consequences when the code is parallelized.

In general, when can one use mutable variables without risking having problems with parallelized code? Is there a simple test that would allow me to ascertain the robustness of a function when called in parallel?


Solution

  • The speed difference with arrays has nothing to do with mutability; it's all about cache locality. Arrays are contiguous in memory, so they are faster to iterate through than lists: F# lists are singly-linked lists, so every item can be (and usually is) at a different memory location. This means that you don't benefit from the CPU's cache, whereas with arrays, once you've paid the cost of retrieving the first item from memory, then the 2nd through Nth items (where the value of N depends on the size of the items you're retrieving) are already in the cache and ready for nearly-instant retrieval. If F# had an ImmutableArray class and you used that, you'd get the same speed benefits when mapping through that ImmutableArray as from your mutable array.

    As for your main question, about when it's safe to use mutable variables with parallel code, the simple test is to ask "Am I actually mutating the data that multiple threads are using?" If you're not mutating your data, then it's safe to have multiple threads accessing it in parallel. Even if the data could be mutated (e.g., an array), as long as you aren't actually mutating it, then your parallel code won't run into problems. If you do mutate the data, then you have to deal with locking, and all the host of problems that come along with locking such as resource starvation, deadlocks, and so on.

    So the simple rule of thumb is "Mutating data + Parallelism = Pain". If you mutate your data but aren't running parallel code, you'll have far less pain. If you aren't mutating your data, then parallel code won't cause you pain. But if you're doing both, get ready for headaches.