Search code examples
fortranranking

Efficient algorithm for iterative ranking of a vector


Suppose I've a scrambled vector of consecutive integers 1:n, say {3,6,2,1,4,5}. My problem is to find, for each element, the number of elements to its left that are smaller than itself. So I'd like the program to return {0,1,0,0,3,4} for this example. This is what I've written in Fortran:

subroutine iterrank(n,invec,outvec,tempvec)
    implicit none

    integer :: n, i, currank
    integer, dimension(n) :: invec, outvec, tempvec

    tempvec = 0
    outvec = 0
    do i = 1,n
        currank = invec(i)
        outvec(i) = tempvec(currank)
        tempvec(currank:n) = tempvec(currank:n) + 1
    end do

    return
end subroutine

It takes a temporary array (vector), and for each digit d the loop comes across, it adds 1 to every element beyond position d in the temporary vector. The next iteration then takes the appropriate element in the temporary vector as the count of elements smaller than itself. My questions are:

1) I believe this is of complexity O(n^2), since there are O(n) writes to the temporary vector in each iteration of the loop. Am I correct?

2) Is there a more efficient way of doing this for large n (say, >100k)?


Solution

  • I believe this would be more efficient, and you could also reduce the temporary integer array to a single byte.

    subroutine iterrank(n,invec,outvec,tempvec)
        implicit none
    
        integer :: n, i, currank
        integer, dimension(n) :: invec, outvec, tempvec
    
        tempvec = 0
        !outvec = 0 ! no need to initialize something overwritten below
        do i = 1 , n
            currank = invec(i)
            outvec(i) = sum( tempvec(1:currank) )
            tempvec(currank) = 1
        end do
    
    end subroutine
    

    The gain is that you are only writing twice per index, however you are reading elements a maximum of n*n times.

    EDIT:

    I haven't tried this, but it should do less reads, with a possible overhead of branching. It is possibly faster for extremely large arrays, I would however expect it to be slower for short arrays:

    subroutine iterrank(n,invec,outvec,tempvec)
      implicit none
    
      integer :: n, i, currank, prevrank
      integer, dimension(n) :: invec, outvec, tempvec
    
      tempvec = 0
      outvec(1) = 0
      tempvec(invec(1)) = 1
      do i = 2 , n
         prevrank = invec(i-1)
         currank = invec(i)
         if ( abs(prevrank-currank) > currank ) then
            outvec(i) = sum( tempvec(1:currank) )
         else if ( prevrank < currank ) then
            outvec(i) = outvec(i-1) + sum( tempvec(prevrank:currank) )
         else
            outvec(i) = outvec(i-1) - sum( tempvec(currank:prevrank-1) )
         end if
         tempvec(currank) = 1
      end do
    
    end subroutine iterrank