Search code examples
c++atomicc++-amp

C++ amp atomics


I am rewriting an algorithm in C++ AMP and just ran into an issue with atomic writes, more specifically atomic_fetch_add, which apparently is only for integers?

I need to add a double_4 (or if I have to, a float_4) in an atomic fashion. How do I accomplish that with C++ AMP's atomics?

Is the best/only solution really to have a lock variable which my code can use to control the writes? I actually need to do atomic writes for a long list of output doubles, so I would essentially need a lock for every output.

I have already considered tiling this for better performance, but right now I am just in the first iteration.

EDIT: Thanks for the quick answers already given. I have a quick update to my question though.

I made the following lock attempt, but it seems that when one thread in a warp gets past the lock, all the other threads in the same warp just tags along. I was expecting the first warp thread to get the lock, but I must be missing something (note that it has been quite a few years since my cuda days, so I have just gotten dumb)

parallel_for_each(attracting.extent, [=](index<1> idx) restrict(amp)
{
   .....
   for (int j = 0; j < attracted.extent.size(); j++)
   {
      ...
      int lock = 0; //the expected lock value
      while (!atomic_compare_exchange(&locks[j], &lock, 1));
      //when one warp thread gets the lock, ALL threads continue on
      ...
      acceleration[j] += ...; //locked write
      locks[j] = 0; //leaving the lock again
   }
});

It is as such not a big problem, since I should write into a shared variable at first and only write it to global memory after all threads in a tile have completed, but I just don't understand this behavior.


Solution

  • The question has as such been answered by others and the answer is that you need to handle double atomics yourself. There is no function for it in the library.

    I would also like to elaborate on my own edit, in case others come here with the same failing lock.

    In the following example, my error was in not realizing that when the exchange failed, it actually changed the expected value! Thus the first thread would expect lock being zero and write a 1 in it. The next thread would expect 0 and would fail to write a one - but then the exchange wrote a one in the variable holding the expected value. This means that the next time the thread tried to do an exchange it expects a 1 in the lock! This it gets and then it thinks it gets the lock.

    I was absolutely not aware that the &lock would receive a 1 on failed exchange match!

    parallel_for_each(attracting.extent, [=](index<1> idx) restrict(amp)
    {
       .....
       for (int j = 0; j < attracted.extent.size(); j++)
       {
          ...
          int lock = 0; //the expected lock value
    
          **//note that, if locks[j]!=lock then lock=1
          //meaning that ACE will be true the next time if locks[j]==1
          //meaning the while will terminate even though someone else has the lock**
          while (!atomic_compare_exchange(&locks[j], &lock, 1));
          //when one warp thread gets the lock, ALL threads continue on
          ...
          acceleration[j] += ...; //locked write
          locks[j] = 0; //leaving the lock again
       }
    });
    

    It seems that a fix is to do

    parallel_for_each(attracting.extent, [=](index<1> idx) restrict(amp)
    {
       .....
       for (int j = 0; j < attracted.extent.size(); j++)
       {
          ...
          int lock = 0; //the expected lock value
    
          while (!atomic_compare_exchange(&locks[j], &lock, 1))
          {
              lock=0; //reset the expected value
          };
          //when one warp thread gets the lock, ALL threads continue on
          ...
          acceleration[j] += ...; //locked write
          locks[j] = 0; //leaving the lock again
       }
    });