Search code examples
pythonmysqlsqlpython-3.xpymysql

How to limit list in python to show N records for each unique row?


I am trying to limit mysql output of query to show only Top N records for each genre. This is my code :

def selectTopNactors(n):

# Create a new connection
con=connection()

# Create a cursor on the connection
cur=con.cursor()
#execute query
int(n)
sql ="""SELECT g.genre_name, a.actor_id,COUNT(mg.genre_id) as num_mov
FROM actor as a, role as r,movie as m,genre as g, movie_has_genre as mg
WHERE a.actor_id = r.actor_id AND m.movie_id = r.movie_id
      AND m.movie_id = mg.movie_id AND g.genre_id = mg.genre_id
      AND (g.genre_id, m.movie_id) IN (SELECT g.genre_id, m.movie_id
       FROM movie as m, genre as g, movie_has_genre as mg
       WHERE m.movie_id = mg.movie_id AND mg.genre_id = g.genre_id 
       ORDER BY g.genre_id)
       GROUP BY g.genre_name, a.actor_id
       ORDER BY g.genre_name, COUNT(*) desc """

cur.execute(sql)

results = cur.fetchall()


listab = []
listac = []
for row in results:
     lista = []
     lista.append(row[0])
     lista.append(row[1])
     lista.append(row[2])
     listab = tuple(lista)
     listac.append(listab)
head = ("genreName","actorId","numberOfMovies")    
listac.insert(0,head) 


print (n)
con.commit()
return listac

And the list which is returning it is huge (6000+) records, So I want to show only N records for each genre. returned list is here


Solution

  • In version of MySQL before 8.0, we can emulate analytic functions using user-defined variables in carefully crafted queries. Note that we are depending on behavior of user-defined variables that is not guaranteed (documented in the MySQL Reference Manual).

    SELECT @rn := IF(c.genre_name=@prev_genre,@rn+1,1) AS rn
         , @prev_genre := c.genre_name                 AS genre_name
         , c.actor_id                                  AS actor_id
         , c.num_mov                                   AS num_mov
      FROM ( SELECT @prev_genre := NULL, @rn := 0 ) i
     CROSS 
      JOIN ( SELECT g.genre_name
                  , a.actor_id
                  , COUNT(1) AS num_mov
               FROM actor a
               JOIN role r
                 ON r.actor_id = a.actor_id
               JOIN movie m
                 ON m.movie_id = r.movie_id
               JOIN movie_has_genre mg
                 ON mg.movie_id = m.movie_id
               JOIN genre g
                 ON g.genre_id = mg.genre_id
              GROUP
                 BY g.genre_name
                  , a.actor_id
              ORDER
                 BY g.genre_name
                  , COUNT(1) DESC
                  , a.actor_id
           ) c
     ORDER
        BY c.genre_name
         , c.num_mov DESC
         , c.actor_id
    HAVING rn <= 4
    

    The literal 4 at the end of the query represents the value N in the question.

    In MySQL 8.0, we can use the newly introduced analytic functions, to obtain an equivalent result:

    SELECT ROW_NUMBER() OVER(PARTITION BY c.genre_name ORDER BY c.num_mov DESC, c.actor_id)
           AS rn
         , c.genre_name                                 AS genre_name
         , c.actor_id                                   AS actor_id
         , c.num_mov                                    AS num_mov
      FROM ( SELECT g.genre_name
                  , a.actor_id
                  , COUNT(1) AS num_mov
               FROM actor a
               JOIN role r
                 ON r.actor_id = a.actor_id
               JOIN movie m
                 ON m.movie_id = r.movie_id
               JOIN movie_has_genre mg
                 ON mg.movie_id = m.movie_id
               JOIN genre g
                 ON g.genre_id = mg.genre_id
              GROUP
                 BY g.genre_name
                  , a.actor_id
              ORDER
                 BY g.genre_name
                  , COUNT(1) DESC
                  , a.actor_id
           ) c
     ORDER
        BY c.genre_name
         , c.num_mov DESC
         , c.actor_id
    HAVING rn <= 4