Search code examples
performancefortrannested-loopsintel-fortran

Avoiding Conditional Statements in Loops


There is a portion of my f90 program that is taking up a significant amount of compute time. I am basically looping through three matrices (of the same size, with dimensions as large as 250-by-250), and trying to make sure values stay bounded within the interval [-1.0, 1.0]. I know that it is best practice to avoid conditionals in loops, but I am having trouble figuring out how to re-write this block of code for optimal performance. Is there a way to "unravel" the loop or use a built-in function of some sort to "vectorize" the conditional statements?

do ind2 = 1, size(u_mat,2)
    do ind1 = 1,size(u_mat,1)
        ! Dot product 1 must be bounded between [-1,1]
        if (b1_dotProd(ind1,ind2) .GT. 1.0_dp) then
            b1_dotProd(ind1,ind2) = 1.0_dp
        else if (b1_dotProd(ind1,ind2) .LT. -1.0_dp) then
            b1_dotProd(ind1,ind2) = -1.0_dp
        end if
        ! Dot product 2 must be bounded between [-1,1]
        if (b2_dotProd(ind1,ind2) .GT. 1.0_dp) then
            b2_dotProd(ind1,ind2) = 1.0_dp
        else if (b2_dotProd(ind1,ind2) .LT. -1.0_dp) then
            b2_dotProd(ind1,ind2) = -1.0_dp
        end if
        ! Dot product 3 must be bounded between [-1,1]
        if (b3_dotProd(ind1,ind2) .GT. 1.0_dp) then
            b3_dotProd(ind1,ind2) = 1.0_dp
        else if (b3_dotProd(ind1,ind2) .LT. -1.0_dp) then
            b3_dotProd(ind1,ind2) = -1.0_dp
        end if
    end do
end do

For what it's worth, I am compiling with ifort.


Solution

  • You can use the intrinsic min and max functions for this.

    As they are both elemental, you can use them on the whole array, as

    b1_dotProd = max(-1.0_dp, min(b1_dotProd, 1.0_dp))
    

    While there are processor instructions which allow min and max to be implemented without branches, it will depend on the compiler implementation of min and max as to whether or not this is actually done and if this is actually any faster, but it is at least a lot more concise.