Search code examples
algorithmlistsortingnumericscoring

Sort list of anything by relations


In an application I'm doing, I am given a list of items, and relations between them and I need to calculate a sorted list. For fun, lets use an exam scoring example, we know that:

  1. John has done better than Jane

  2. Jane has done worse than Luke

  3. Luke has done better than Jason

  4. Jason has done worse than Tim

For this list, I need to calculate the following sorted list:

  1. John
  2. Luke
  3. Jane
  4. Tim
  5. Jason

How do I derive an item's position in the sorted list, based on the contested relations?


Solution

  • Fundamentally, sorting require a linear order.

    You need your relation to be a linear order, which means :

    1. a ≤ a (reflexivity)
    2. if a ≤ b and b ≤ a then a = b (antisymmetry)
    3. if a ≤ b and b ≤ c then a ≤ c (transitivity)
    4. for all a and b, a ≤ b or b ≤ a (totality)

    So, what does it mean? Most usual relations are linear. For example, if you are talking about grades, which are in fact real numbers, then yes, grades can be linearly ordered. Rankings too, as they again are only real numbers.

    But for example, if you're ranking chess players, you cannot use their match history. If player A has beaten player B in the majority of their match, and B has beaten C again in the majority of their match, this does NOT mean that A > B > C, simply because C may have beaten A in the majority of their match.

    This is by the way one of the reasons why Elo ratings exist, they needed a linear order to sort who's the best player, and notice how Elo is basically just a real number, thus has a linear order.

    So, for exam scoring, you simply do NOT have that linear order. Why? It's real numbers! Well, true, you have this linear order, but you cannot ask this linear order everything, You're restricted to the predefined comparison list, which is here :

    1. John has done better than Jane
    2. Jane has done worse than Luke
    3. Luke has done better than Jason
    4. Jason has done worse than Tim

    So, the answer is : No, in general, you can't sort a list if you do not have the comparison between any pair of elements.

    Now, I'm working on a "least bad" answer, but it's not trivial to me...


    Edit : OK, I came up with something. The idea is that you are going to extract the most possible information on your predefined comparison list. Then treat any comparison that is not in your extended comparison list as an equality.

    So, how do you do it?

    First, for any a < b comparison, add the entry b > a. Then, you will look through your predefined comparison list for any comparison a < b and b < c and add (if it's already not in here) all corresponding a < c (transitivity) and c > a. If you know there are no equalities (no students have the same grade), it's finished. Else, you have to add all the reflexivity and antisymmetry relations. You can't do anything about totality. Now, you have your extended comparison list.

    Now, take your favourite sorting algorithm. If you don't have one, quicksort is nice, efficient and short to write. Then everytime the algorithm asks for a comparison between a and b, instead, look at your extended comparison list. If the comparison is here, great, else, treat the comparison as a=b. (Note that it doesn't matter if you know that there are no equalities, the algorithm doesn't care)

    The result will be a "least badly" sorted list. There are usually several possible "least badly" sorted list, but this algorithm will give you just one.


    So, let's give an example. It's nearly the same as the one given int the OP, with a slight difference ("John has done worse than Luke" instead of "Jane has done worse than Luke"). so, we start with :

    1. John has done better than Jane
    2. John has done worse than Luke
    3. Luke has done better than Jason
    4. Jason has done worse than Tim

    Now, for every "X is worse than Y", I add "Y is better than X", and vice-versa. The new sentences are in bold :

    1. John has done better than Jane
    2. Luke has done better than John
    3. Luke has done better than Jason
    4. Tim has done better than Jason
    5. Jane has done worse than John
    6. John has done worse than Luke
    7. Jason has done worse than Luke
    8. Jason has done worse than Tim

    Finally, I scan all the possible sentences "X has done better then Y" and "Y has done better than Z", and I add "X has done better than Z"

    1. John has done better than Jane
    2. Luke has done better than John
    3. Luke has done better than Jason
    4. Tim has done better than Jason
    5. Luke has done better than Jane
    6. Jane has done worse than John
    7. John has done worse than Luke
    8. Jason has done worse than Luke
    9. Jason has done worse than Tim
    10. Jane has done worse than Luke

    The extended table is finished.

    Let's now take a look at the pseudocode of quicksort :

      function quicksort('array')
          if length('array') ≤ 1
              return 'array'  // an array of zero or one elements is already sorted
          select and remove a pivot value 'pivot' from 'array'
          create empty lists 'less' and 'greater'
          for each 'x' in 'array'
              if 'x' ≤ 'pivot' then append 'x' to 'less'
              else append 'x' to 'greater'
          return concatenate(quicksort('less'), 'pivot', quicksort('greater'))
    

    The only difference with the standard quicksort is that you cannot generally know the answer to the question if('x' ≤ 'pivot') .... So instead, if x=Luke and pivot = Tim, you look in your table for the sentence "Luke has done worse than Tim". If you find it, then you consider that the answer is true and do append 'x' to 'less'. If you find in the table "Luke has done better than Tim", then you consider that the answer is false, and you do append 'x' to 'greater'. And finally, if you can't find any of the two sentences mentionned above, you act as if 'x' == 'pivot', and you do append 'x' to 'less'