Search code examples
multithreadingparallel-processingfortranopenmp

How can i avoid the "i" dependency in this loop? Fortran


I'm working with OpenMP in my code, but in order to do that I have to solve this dependency:

do q=1,pppp
    i=0
    
    DO j=1, pppp
        do c1=1,3
            vect(c1)=xx(q,c1)-xx(j,c1)
        end do
        dist=sqrt(vect(1)**2+vect(2)**2+vect(3)**2)
        if(dist<0.0001)then
            i=i+1
            if(i>10)i=10
            caravec(q,i)=j
        endif
    ENDDO
ENDDO  

I'm trying to avoid the ordered clause, because it is expensive, but I don't know how to remove the dependency. How can I do that?
Thanks for all the help


Solution

  • There are a few Fortran things you can do to improve on @dreamcrash' excellent answer:

    • Fortran is column-major, so you want to iterate over left-most indices in inner loops, not right-most indices. So you should probably transpose xx, so it's a 3*pppp array indexed as xx(c1,q) rather than a pppp*3 array indexed as xx(q,c1).

    • You probably want to use whole-array operations rather than single-element operations, as these have a better chance of being vectorised.

    • You can store the result of dist<0.0001 rather than dist.

    • You can replace the if (i>10)... with a min statement, which will be branchless and so likely run faster.

    So the code would look something like:

    do q=1,pppp        
      do j=1,pppp
        vect = xx(:,q)-xx(:,j)
        within_tolerance(j, q) = dot_product(vect, vect) < 1.0e-8
      enddo
    enddo
    
    do q=1,pppp
      i=0    
      do j=1,pppp
        if (within_tolerance(j, q)) then
          i = min(i+1, 10)
          caravec(q,i)=j
        endif
      enddo
    enddo
    

    You may also want to transpose caravec, but this will depend on how it is used elsewhere.

    An XY problem solution

    If this is the bottleneck of your code, you might want to look into voxel-based methods for finding nearest-neighbours within sets of vectors. A quick google brings up e.g. this.

    Voxel methods remove the need for the double loop over q and j, which can potentially let you do these kinds of comparisons much faster.

    On the other hand, voxel methods are quite complicated, and I don't think they're appropriate for all circumstances (e.g. if you have very dense regions of points and very sparse regions of points then I think a voxel method would struggle).