Search code examples
algorithmsortingranking

Ranking - minimizing pair-wise comparison


I have a list of 100 items and I want generate a ranked list, the user is asked to rate pairs of items, prompting: "Do you prefer item A or item B instead?". Since it would be really tiring asking every possible combination of those items. Eg. if item 1 is prefered over item 2, and item 2 is preferred over item 3, it's not necessary to compare item 1 and item 3.

I'm looking for a algorithm that reduce such comparisons and will give a complete ranked list.

If the number of comparisons would be improved by using 3-pairs of items instead, it would be prefered that way. In this case the user would rate the best and the worst of the 3 pairs, thus ranking 3 items per 'turn'.


Solution

  • There are 100! possible orders. 2^524 < 100! < 2^525 so the best you can ever do is 525 binary questions.

    The 3 question version does better. It can give 6 possible answers, so the best you can ever do by the same analysis is 204 questions.

    It is unlikely that you can easily get to those bounds. But you can get close.

    For binary comparisons, it will be hard to do better than the Ford-Johnson Merge-Insertion Sort.

    With 3-way comparisons, a truly optimal answer will be a fun research project. But you can come close with a greedy algorithm by finding 3 elements such that the most likely order is as unlikely as possible.

    An exact calculation will be too hard. But a greedy approach goes as follows. For each element, calculate how many others are known to be better, and how many are known to be worse. Sort them into a priority list by, "fewest known comparisons" then "smallest difference between better/worse" then "fewest known to be worse" then "original order".

    Take the first element in this priority list.. Look down the list for the first that it has never been compared with. Continue down looking for the first that has never been compared with either. If you fail to find one, instead look for one that has never been compared with the first but may have been compared with the second. Failing that, just compare the pair.

    An example may help. Let's try to sort 2, 4, 1, 3, 5 with 3 way comparisons. With no information, we at first just take the order we were given. We start by comparing 2, 4, 1 and find out that that is 1, 2, 4. At this point we know:

    1: better: 2, 4; worse:
    2: better: 4; worse: 1
    3: better: ; worse:
    4: better: ; worse: 1, 2
    5: better: ; worse:
    

    Now we sort by fewest comparisons, then smallest difference between number of best and worse, then fewest worse, then original order. This gives 3, 5, 2, 1, 4. None of 3, 5, 2 have been compared, so we do that next to get 2, 3, 5. We now know:

    1: better: 2, 3, 4, 5; worse:
    2: better: 3, 4, 5; worse: 1
    3: better: 5; worse: 1, 2
    4: better: ; worse: 1, 2
    5: better: ; worse: 1, 2, 3
    

    And now our priority sort gives us 4, 3, 5, 2, 1. We take 4, 5. Everything has been compared to one of them so we go to 3 because it has not been compared to 4 yet. These three compare 3, 4, 5. We now know:

    1: better: 2, 3, 4, 5; worse:
    2: better: 3, 4, 5; worse: 1
    3: better: 4, 5; worse: 1, 2
    4: better: 5; worse: 1, 2, 3
    5: better: ; worse: 1, 2, 3, 4
    

    And we sorted 5 things with 3 3-way comparisons. Which is theoretically the best that we could do. (As you increase the number, I doubt you will often do as well. But it should beat a binary strategy.)