haskell concurrency parallel-processing massiv

How to atomically modify a value within a mutable array?

I have this mutable array in an ST monad. And I have this loop function.

 runST $ do
    myarray <- newMArray (Sz 10) 0
    loopM_ 0 (<10) (+1) (\j ->
      loopM_ 0 (<10) (+1) (\i ->
        when (mytruthcheck j i)
             (modifyM_ myarray (pure . (+1)) ((funcofji j i) :: Int)
     )))

I want to use forkST_ to run the outer loop in parallel like this.

runST $ do
   myarray <- newMArray (Sz 10) 0
   loopM_ 0 (<10) (+1) (\j ->
     void (forkST_ (loopM_ 0 (<10) (+1) (\i ->
       when (mytruthcheck j i)
            (Data.Massiv.Array.Mutable.modifyM_
                 myarray (pure . (+1)) ((funcofji j i) :: Int)
    ))))

But I’m guessing this will cause thread collisions but I don’t really know although I do know it is possible that funcofji can output the same value for different values of j and therefore the loop can modify the same index of myarray for different j s. Is there a way to ensure this is done atomically or is that already the case?

Btw here’s the loopM_ function

loopM_ :: Monad m => Int -> (Int -> Bool) -> (Int -> Int) -> (Int -> m a) -> m ()
loopM_ !init' condition increment f = go init'
  where
    go !step
      | condition step = f step >> go (increment step)
      | otherwise = pure ()

Solution

As it was mentioned in the comments, atomic modification is only useful for concurrency ~~which doesn't look like what is needed here. What you need is parallelism.~~, which is available in massiv for Ints: atomicAddIntArray

There is also a builtin way to do parallelism in massiv very efficiently, so there is definitely no need to reinvent the wheel:

  createArray_ Par (Sz 10) $ \scheduler myarray ->
    loopM_ 0 (<10) (+1) $ \j ->
      loopM_ 0 (<10) (+1) $ \i ->
        when (mytruthcheck j i) $ 
          scheduleWork_ scheduler $ 
            void $ atomicAddIntArray myarray ((funcofji j i) :: Int) 1

Also don't lie to yourself, ST (state thread) was not build for multi-threading, use IO instead. However, if you can guarantee, that despite multi-threaded setup, the outcome that is produced is still deterministic in the end, then it is ok to use unsafePerformIO.

Edit

I just noticed this comment:

The loop size of j is close to 100,000 and the loop size of i is close to 1 Billion.

Which makes me believe that it will be better to parallelize it in this way instead:

  createArray_ Par (Sz 10) $ \scheduler myarray ->
    iforSchedulerM_ scheduler (0 ..: (100000 :. 1000000000)) $ \_ (j :. i) ->
      when (mytruthcheck j i) $ 
        void $ atomicAddIntArray myarray ((funcofji j i) :: Int) 1

This will ensure that you only schedule a few jobs, instead of billions. Check out iforSchedulerM_ implementation in order to customize it parallelization if you have more insight into your per i and j workload.