Search code examples
mysqlgroup-bylimitdatabase-performance

MySQL sum ranking, group by genre, limit 10


I have MySQL tables like these and I would like to calculate the TOP10 for each genre:

  • rankings_2016 (trackId, genreId, ranking, timestamp)
  • genres (genreId, genreName)
  • tracks (trackId, trackName, genreId)
  • artists (artistId, artistName)
  • artists_tracks (artistId, trackId)

I would like to get TOP10 rankings for each genre, for every track and for every artist.

A track or an artist could have up to 2 genres. Ranking could be the same. Just to get the idea with LIMIT 2:

  genreId | trackId | ranking
 ---------------------------------
   0         1111      100
   0         2222       99
   1         1111      100
   1         2222       99

  genreId | artistId | ranking
 ---------------------------------
   0         1111      100
   0         2222       99
   1         1111      100
   1         2222       99

The only solution I found is getting everything in a table and then LIMIT 10 in the page, but it's killing my database in terms of size (I have limited resources).

For the tracks I wrote this:

SELECT trackId, genreId, @newRank := SUM(ranking) as ranking
FROM rankings_2016
WHERE timestamp >= ( select unix_timestamp('2016-01-01') )
AND timestamp <= ( select unix_timestamp('2016-12-31') )
GROUP BY trackId, genreId

For the artists:

SELECT artistId, genreId, @newRank := SUM(a1.ranking) as ranking
FROM rankings_2016 a1
LEFT JOIN artists_tracks a2
ON a1.trackId = a2.trackId
WHERE timestamp >= ( select unix_timestamp('2016-01-01') )
AND timestamp <= ( select unix_timestamp('2016-12-31') )
GROUP BY artistId, genreId

Thank all in advance for your hints.


UPDATE

The logic in general (and accepted reply) requires good indexes and performant server.

ARTISTS in my case failed with error 500 unless I increased CPU. In general replacing LEFT with INNER saves 1 second.


Solution

  • Consider a correlated count subquery to rank order the rankings by Artist / Track / Genre groupings. Then use this rank calculated column in outer query to filter for top 10 per grouping:

    Artist Ranking (top 10 rankings per artist and genre)

    SELECT main.artistId, main.genreId, main.ranking
    FROM
     (
       SELECT a.artistId, r.genreId, r.ranking,
              (SELECT COUNT(*) FROM rankings_2016 subr
               LEFT JOIN artists_tracks suba ON subr.trackId = suba.trackId
               WHERE suba.artistId = a.artistId
               AND subr.genreId = r.genreId
               AND subr.ranking >= r.ranking) AS rn
       FROM rankings_2016 r
       LEFT JOIN artists_tracks a ON r.trackId = a.trackId
       WHERE r.timestamp BETWEEN ( select unix_timestamp('2016-01-01') )
                             AND ( select unix_timestamp('2016-12-31') ) 
     ) AS main
    
    WHERE main.rn <= 10
    

    Track Ranking (top 10 rankings per track and genre)

    SELECT main.trackId, main.genreId, main.ranking
    FROM
     (
       SELECT r.trackId, r.genreId, r.ranking,
              (SELECT COUNT(*) FROM rankings_2016 subr               
               WHERE subr.genreId = r.genreId
               AND subr.trackId = r.trackId
               AND subr.ranking >= r.ranking) AS rn
       FROM rankings_2016 r
       WHERE r.timestamp BETWEEN ( select unix_timestamp('2016-01-01') )
                             AND ( select unix_timestamp('2016-12-31') ) 
     ) AS main
    
    WHERE main.rn <= 10