I am trying to limit mysql output of query to show only Top N records for each genre. This is my code :
def selectTopNactors(n):
# Create a new connection
con=connection()
# Create a cursor on the connection
cur=con.cursor()
#execute query
int(n)
sql ="""SELECT g.genre_name, a.actor_id,COUNT(mg.genre_id) as num_mov
FROM actor as a, role as r,movie as m,genre as g, movie_has_genre as mg
WHERE a.actor_id = r.actor_id AND m.movie_id = r.movie_id
AND m.movie_id = mg.movie_id AND g.genre_id = mg.genre_id
AND (g.genre_id, m.movie_id) IN (SELECT g.genre_id, m.movie_id
FROM movie as m, genre as g, movie_has_genre as mg
WHERE m.movie_id = mg.movie_id AND mg.genre_id = g.genre_id
ORDER BY g.genre_id)
GROUP BY g.genre_name, a.actor_id
ORDER BY g.genre_name, COUNT(*) desc """
cur.execute(sql)
results = cur.fetchall()
listab = []
listac = []
for row in results:
lista = []
lista.append(row[0])
lista.append(row[1])
lista.append(row[2])
listab = tuple(lista)
listac.append(listab)
head = ("genreName","actorId","numberOfMovies")
listac.insert(0,head)
print (n)
con.commit()
return listac
And the list which is returning it is huge (6000+) records, So I want to show only N records for each genre. returned list is here
In version of MySQL before 8.0, we can emulate analytic functions using user-defined variables in carefully crafted queries. Note that we are depending on behavior of user-defined variables that is not guaranteed (documented in the MySQL Reference Manual).
SELECT @rn := IF(c.genre_name=@prev_genre,@rn+1,1) AS rn
, @prev_genre := c.genre_name AS genre_name
, c.actor_id AS actor_id
, c.num_mov AS num_mov
FROM ( SELECT @prev_genre := NULL, @rn := 0 ) i
CROSS
JOIN ( SELECT g.genre_name
, a.actor_id
, COUNT(1) AS num_mov
FROM actor a
JOIN role r
ON r.actor_id = a.actor_id
JOIN movie m
ON m.movie_id = r.movie_id
JOIN movie_has_genre mg
ON mg.movie_id = m.movie_id
JOIN genre g
ON g.genre_id = mg.genre_id
GROUP
BY g.genre_name
, a.actor_id
ORDER
BY g.genre_name
, COUNT(1) DESC
, a.actor_id
) c
ORDER
BY c.genre_name
, c.num_mov DESC
, c.actor_id
HAVING rn <= 4
The literal 4
at the end of the query represents the value N
in the question.
In MySQL 8.0, we can use the newly introduced analytic functions, to obtain an equivalent result:
SELECT ROW_NUMBER() OVER(PARTITION BY c.genre_name ORDER BY c.num_mov DESC, c.actor_id)
AS rn
, c.genre_name AS genre_name
, c.actor_id AS actor_id
, c.num_mov AS num_mov
FROM ( SELECT g.genre_name
, a.actor_id
, COUNT(1) AS num_mov
FROM actor a
JOIN role r
ON r.actor_id = a.actor_id
JOIN movie m
ON m.movie_id = r.movie_id
JOIN movie_has_genre mg
ON mg.movie_id = m.movie_id
JOIN genre g
ON g.genre_id = mg.genre_id
GROUP
BY g.genre_name
, a.actor_id
ORDER
BY g.genre_name
, COUNT(1) DESC
, a.actor_id
) c
ORDER
BY c.genre_name
, c.num_mov DESC
, c.actor_id
HAVING rn <= 4