I'm studying multicore parallelism in F#. I have to admit that immutability really helps to write correct parallel implementation. However, it's hard to achieve good speedup and good scalability when the number of cores grows. For example, my experience with Quick Sort algorithm is that many attempts to implement parallel Quick Sort in a purely functional way and using List
or Array
as the representation are failed. Profiling those implementations shows that the number of cache misses increases significantly compared to those of sequential versions. However, if one implements parallel Quick Sort using mutation inside arrays, a good speedup could be obtained. Therefore, I think mutation might be a good practice for optimizing multicore parallelism.
I believe that cache locality is a big obstacle for multicore parallelism in a functional language. Functional programming involves in creating many short-lived objects; destruction of those objects may destroy coherence property of CPU caches. I have seen many suggestions how to improve cache locality in imperative languages, for example, here and here. But it's not clear to me how they would be done in functional programming, especially with recursive data structures such as trees, etc, which appear quite often.
Are there any techniques to improve cache locality in an impure functional language (specifically F#)? Any advices or code examples are more than welcome.
As far as I can make out, the key to cache locality (multithreaded or otherwise) is
To this end ;
In practice this means that you may end up using data structures that are not theoretically perfect examples of computer science - but that's all right, computers aren't theoretically perfect examples of computer science either.
A good academic paper on the subject is Cache-Efficient String Sorting Using Copying