Search code examples
pythonrankingrating

Python Trueskill (ELO) scores drift down


why is it that a random population with random winners, the scores slowly drift to 0. I get that the scores might be random, but why always drift negatively?

import trueskill as ts
from random import choice

r = []
for i in range(10):
    r.append(ts.Rating())


def avg(r):
    ratings = [(a.mu, a.sigma) for a in r]
    mus = list(zip(*ratings))[0]
    sigmas = list(zip(*ratings))[1]
    avg_mu = sum(mus) / float(len(mus))
    avg_sigma = sum(sigmas) / float(len(sigmas))
    return avg_mu, avg_sigma


for j in range(20000):
    p1_ix = choice(range(len(r)))
    p2_ix = choice(range(len(r)))
    p1 = r[p1_ix]
    p2 = r[p2_ix]
    r[p1_ix], r[p2_ix] = ts.rate_1vs1(p1, p2)
    if not j % 1000:
        print(avg(r))

Solution

  • You're using the TrueSkill algorithm, not ELO. TrueSkill has a different method for updating skill ratings. While ELO is a zero-sum system, TrueSkill relies on an uncertainty parameter (determined by by both the number of games as well as their outcomes) to adjust skill ratings. Therefore, TrueSkill is only zero-sum if the two players in a match have equal uncertainty values.

    The simulation you ran creates situations where players will have vastly different uncertainty parameters. This, coupled with the fact that you are violating the assumption that higher rated players will win more often, will result in some strange behavior. A better simulation would be to have several round-robin schedules. This way the amount of games played will be more similar in every matchup. If you run the code below, the average rating will stay close to 25.

    import trueskill as ts
    from random import choice, shuffle
    import numpy as np
    
    r = []
    for i in range(10):
        r.append(ts.Rating())
    
    
    def avg(r):
        ratings = [(a.mu, a.sigma) for a in r]
        mus = list(zip(*ratings))[0]
        sigmas = list(zip(*ratings))[1]
        avg_mu = sum(mus) / float(len(mus))
        avg_sigma = sum(sigmas) / float(len(sigmas))
        return avg_mu, avg_sigma
    
    
    for j in range(4444):
        # Create array of all possible matchup combinations
        possible_matches = np.array(list(itertools.combinations(list(range(len(r))), 2)))
        # Shuffle the matches to create a random-order round-robin schedule
        np.random.shuffle(possible_matches)
        for match in possible_matches:
            # Shuffle the order of the players in each match to randomize the result
            np.random.shuffle(match)
            p1_ix = match[0]
            p2_ix = match[1]
            p1 = r[p1_ix]
            p2 = r[p2_ix]
            r[p1_ix], r[p2_ix] = ts.rate_1vs1(p1, p2)
        if j % 222 == 0:
            print(avg(r))
    

    Also, it's important to note that TrueSkill isn't bounded by 0, so your simulation will lead to negative scores if you run it long enough. But, I can't fully explain why it always generates a negative drift. Intuitively, I would think it would randomly drift positively or negatively with equal probability. My guess is that there is a damping factor of some sort that makes it is less likely for a player to randomly run away with an absurdly high skill rating.