Search code examples
phpjavascriptstatisticsratingchess

Glicko-2 Rating System: Bug or exploit?


Glicko-2 is a rating system used in chess, but can be used in many other situations. Glicko-2 is an improvement on Glicko-1, which addressed problems of the older ELO rating.

What makes Glicko-2 special in comparison to version 1 is that it incorporates a higher rating deviation (RD) the longer someone has been inactive. It does this with the notion of a system constant which relates to time/rating periods.

An example write up from the author is found here: http://www.glicko.net/glicko/glicko2.pdf.
Within this document, he explains:

The Glicko-2 system works best when the number of games in a rating period is moderate to large, say an average of at least 10-15 games per player in a rating period. The length of time for a rating period is at the discretion of the administrator.

Making an assumption that a group of active chess players play 10-15 games on average in a 1 month time period, the administrator would then update ratings at the end of every month.


I needed a PHP Implementation of the Glicko-2 rating system and came across the following:

Glicko-2 JavaScript Implementation

  • The JavaScript had a small error, in which didn't let it match the technical write-up example, the author found it close enough, and didn't bother to debug.

Glicko-2 PHP Implementation

  • The PHP implementation was plagued with many bugs, but that wasn't apparent unless you did more than one rating period (which the technical write-up never shows expected values of)

Glicko-2 Calculator in Excel

  • Finally the Excel calculator seemed to be error-free and the most professional, done by someone in the chess community. Once the JavaScript bug was solved, the JavaScript and Excel Calculator matched very closely with each other (albeit not perfect, could be within rounding error)

I had fixed the bugs (and submitted issues/patches to the authors) I could find on the PHP and JavaScript versions to match as closely to the Excel Calculator


Now I am 99% confident that I have an accurate Glicko-2 implementation (between the 3 of them) for analysis and that is when I came across something strange, and the topic of this discussion.

Given the suggested default for Glicko-2 for a new player:

Rating:      1500
RD:           350
Volatility:  0.06

If you face an average opponent of rating 1378 and RD 99 (Source) only once every rating period (1 month) for the next 12 periods (1 year) you will have accumulated an assumed National Class A (1800-1999) rating of 1852 when in reality you have only beat 12 average rated players over a span of 12 months.

Month   Rating      RD      Volatility      Class
1       1625        259     0.059999        National Class B
2       1682        225     0.059998        〃
3       1718        205     0.059997        〃
6       1784        174     0.059994        〃
12      1852        148     0.059988        National Class A
24      1922        127     0.059976        〃

If you face 2 average opponents every rating period, you can get to National Class A about 4-5 months, facing only 8-10 average opponents.

Month   Rating      RD      Volatility      Class
1       1672        215     0.059999        National Class B
2       1733        183     0.059997        〃
3       1770        166     0.059995        〃
4       1797        154     0.059993        〃
5       1819        146     0.059992        National Class A
6       1836        140     0.059991        〃


Are these assumptions accurate? Is there a bug in my calculator?

If it is not a bug, what are some ways of countering this besides:

  • Consider "true rating" to be lower bound of the deviation (Rating - RD)
  • Do not show inactive user's rating
  • Do not show users with less than N games

Solution

  • It may seem counter-intuitive but this is actually a correct result. If you continuously play average players, but you always win, regardless of the time periods, you're demonstrating you have a high ranking (not an average ranking even though your opponents are average). A player who is average (has a 'true' average rank), playing opponents of exactly the same 'true' rank (average) should win and lose about 50% of the time. A player with a 'true' rank that is very high, will win a larger percentage of the time when playing average players which depends on just how far apart their ranks are, but lets say it's a high enough rank that they should win 90% of the time. That means for ever 10 games played against an average player, this highly ranked player should lose 1 of them.

    What you've effectively modeled is a player that has a rank high enough to win every single game against an average player (more than 12 or 24 games without a loss) which means their score will continue to go up unbounded if they continue to win, because they've never lost. Their demonstrating an ability that (until a loss happens) should have a rank separation large enough to approach an expected win ratio of 100%.