Alright, so I am trying to rework the Github contribution graph feature, and I would like the "level" of contribution to be accurate to Github's. By level I mean the brightness of the square (if you are in darkmode).
Here in this image you can see a day with a high level and a day with a low level.
For starters, the level is calculated in relation to the other days, however it is unclear how that calculation is being done. On the Github Docs it says that the level is calculated based on which quartile the day falls into.
I don't know much about statistics, but shouldn't there be an equal amount of days in each quartile? But, if you look at this contribution graph, you can clearly see there are many more days with a lower level than there are with a higher level.
Is there something I am missing? Am I wrong about quartiles? Any help is appreciated.
I was just as confused when I tried looking into creating a GitHub Action/CLI to draw arbitrary text in my contribution graph, and I wasn't able to find anything online (or in code, existing tools didn't seem to actually understand the calculation and often didn't produce pixel-accurate graphs in the presence of existing commits) which illuminated how quartiles were compatible with what was being drawn. As you already alluded to: how could more than a quarter of days be painted a single color if they were using quartiles?
I had some ideas about how the calculation might actually work, so I wrote some code to query GitHub user contribution history and then draw the contribution graph based on my model. I then spot-checked the output against the user contribution graph actually rendered by GitHub.
Based on my (limited) experimentation, the word "quartile" is a bit misleading (not sure if intentional) in that daily contributions are not evenly divided into 4 groups (each containing ~25% of the data points). Instead, it seems that the range (0, max_daily_contribution_count]
is divided into four buckets (the spread of each being q = max_daily_contribution_count / 4
), where each bucket can contain any number of elements. Coloring is then determined by what range that days contribution count falls into: [[0,1), [1, q), [q, q*2), [q*2, q*3), [q*3, ∞)]
(lightest to darkest).
This comment also mentions that they remove outliers before calculating the "quartiles", but I didn't try to determine how.
Hope that helps!