There is a portion of my f90 program that is taking up a significant amount of compute time. I am basically looping through three matrices (of the same size, with dimensions as large as 250-by-250), and trying to make sure values stay bounded within the interval [-1.0, 1.0]. I know that it is best practice to avoid conditionals in loops, but I am having trouble figuring out how to re-write this block of code for optimal performance. Is there a way to "unravel" the loop or use a built-in function of some sort to "vectorize" the conditional statements?
do ind2 = 1, size(u_mat,2)
do ind1 = 1,size(u_mat,1)
! Dot product 1 must be bounded between [-1,1]
if (b1_dotProd(ind1,ind2) .GT. 1.0_dp) then
b1_dotProd(ind1,ind2) = 1.0_dp
else if (b1_dotProd(ind1,ind2) .LT. -1.0_dp) then
b1_dotProd(ind1,ind2) = -1.0_dp
end if
! Dot product 2 must be bounded between [-1,1]
if (b2_dotProd(ind1,ind2) .GT. 1.0_dp) then
b2_dotProd(ind1,ind2) = 1.0_dp
else if (b2_dotProd(ind1,ind2) .LT. -1.0_dp) then
b2_dotProd(ind1,ind2) = -1.0_dp
end if
! Dot product 3 must be bounded between [-1,1]
if (b3_dotProd(ind1,ind2) .GT. 1.0_dp) then
b3_dotProd(ind1,ind2) = 1.0_dp
else if (b3_dotProd(ind1,ind2) .LT. -1.0_dp) then
b3_dotProd(ind1,ind2) = -1.0_dp
end if
end do
end do
For what it's worth, I am compiling with ifort
.
You can use the intrinsic min and max functions for this.
As they are both elemental, you can use them on the whole array, as
b1_dotProd = max(-1.0_dp, min(b1_dotProd, 1.0_dp))
While there are processor instructions which allow min
and max
to be implemented without branches, it will depend on the compiler implementation of min
and max
as to whether or not this is actually done and if this is actually any faster, but it is at least a lot more concise.