c multithreading parallel-processing cellular-automata

Is mutex lock required when each thread is writing to a separate cell in a shared 2d array

Is a mutex needed in the following Parallel Computation of a Cellular Automata?

I thought it was always advised to use a mutex lock when writing to a shared resource when using parallel threads.

The mutex is dramatically slowing the program, but is it needed in order to ensure data in the 2d array (futureOcean) remains safe/consistent?

When a mutex locks futureOcean during write operations:

2 threads: 163.352990 seconds
5 threads: 515.739237 seconds
10threads: 1021.035517 seconds

Without the use of a mutex:

2 threads: 65.817534 seconds
10threads: 93.822217 seconds

I'm implementing a cellular automata model, specifically the fish, shark and ocean simulation.

I've got a 1000x600 2d array called ocean, obviously representing the ocean.

And another 1000x600 2d array called futureOcean, which will store the states of each cell for currentGeneration + 1.

Initially the ocean is populated with:

50% water i.e 0 values.
25% fish i.e positive integer values
25% shark i.e negative integer values

Processing a single generation involves processing rules for each cell within ocean.

Each time I compute the new value for the current cell, I store its future value in a 2nd 2d array (futureOcean), at the same row/column position that the original value is inside ocean. So each cell in futureOcean will only be updated once per generation.

And I do this 20000 times (generations).

Solution

If Ocean is solely being read from during construction of a new generation, there are no potential race conditions and no lock is needed. Multiple threads can read from a constant source without problems.

If each location in futureOcean is only updated once, then there are no competing writes for the location, and no lock is needed. Individual threads can write to unique locations untouched by other threads without problems.

Then, you'd have to wait for futureOcean to be completely updated before starting a new generation, to avoid reading from it while it is still being written to.

You can speed up multithreaded performance by splitting up the work in such a manner that each thread writes to a contiguous section of the array. Otherwise, multiple threads could write to locations that are close to each other. If this happens within the distance of a cache line or two (64 to 128 bytes in an x86/64 processor), then you could be repeatedly invalidating that section of memory and force reloading in multiple core caches. (See false sharing)