Search code examples
arraysf#ranking

Ranking Function in F#


I wrote an algorithm for ranking an array.

let rankFun array =
    let arrayNew = List.toArray array
    let arrayLength = Array.length arrayNew
    let rankedArray = Array.create arrayLength 1.0
    for i in 0 .. arrayLength - 2 do
        for j in (i+1) .. arrayLength - 1 do
            if arrayNew.[i] > arrayNew.[j] then
                rankedArray.[i] <- rankedArray.[i] + 1.0

            elif arrayNew.[i] < arrayNew.[j] then
                rankedArray.[j] <- rankedArray.[j] + 1.0

            else 
                rankedArray.[i] <- rankedArray.[i] + 0.0
    rankedArray

I wanted to ask you what do you think about performance? I used for loops and I was wondering if you think there's another way better than this one. Before getting to this I was sorting my array, keeping original indexes, ranking and only afterwards resorting each rank to its original position, what was reeeeeally bad in terms of performance. Now I got to this improved version and was looking for some feedback. Any ideas?

Edit: Duplicated elements should have same rank. ;)

Thank you very much in advance. :)


Solution

  • I'm assuming that ranks can be taken from sorting the inputs, since you commented that the question's behavior on duplicates is a bug. It's surprising that the solution with sorting you described ran slower than the code shown in the question. It should be a lot faster.

    A simple way to solve this via sorting is to build an ordered set from all values. Sets in F# are always ordered and contain no duplicates, so they can be used to create the ranking.

    From the set, create a map from each value to its index in the set (plus one, to keep the ranking that starts with 1). With this, you can look up the rank of each value to fill the output array.

    let getRankings array =
        let rankTable =
            Set.ofArray array
            |> Seq.mapi (fun i n -> n, i + 1)
            |> Map.ofSeq
        array |> Array.map (fun n -> rankTable.[n])
    

    This takes an array, rather than a list, because the input parameter in the question was called array. It also uses integers for the ranks, as this is the normal data type for this purpose.

    This is much faster than the original algorithm, since all operations are at most O(n*log(n)), while the nested for-loops in the question are O(n^2). (See also: Wikipedia on Big O notation.) For only 10000 elements, the sorting-based solution already runs over 100 times faster on my computer.

    (BTW, the statement else rankedArray.[i] <- rankedArray.[i] + 0.0 appears to do nothing. Unless you're doing some sort of black magic with the optimizer, you can just remove it.)