I have a class for movies:
class Movie:
def __init__(self,
title: str,
director: str,
actors: list[str]):
self.title = title
self.director = director
self.actors: list[str] = actors
And a list with 3 example movies:
movies = [Movie('Barton Fink', 'Joel Coen', ['John Turturro', 'John Goodman', 'Judy Davis']),
Movie('The Big Lebowski', 'Joel Coen', ['Jeff Bridges', 'John Goodman', 'Steve Buscemi', 'John Turturro']),
Movie('The Big Easy', 'Jim McBride', ['Dennis Quaid', 'Ellen Barkin', 'John Goodman']),
]
I use Pandas to get the number of occurrences of all actors:
John Goodman: 3
John Turturro: 2
Judy Davis: 1
...
For the directors it works this way:
df = DataFrame([vars(m) for m in movies])
grouped = df.groupby(['director']).size().sort_values(ascending=False)
print(df)
But for the actors not:
df = DataFrame([vars(m) for m in movies])
grouped = df.groupby(['actors']).size().sort_values(ascending=False)
print(df)
This doesn't work, because unlike user-defined classes (by default) or tuples, lists aren't hashable:
Error: (<class 'TypeError'>, TypeError("unhashable type: 'list'"), <traceback object at 0x00000274C91E4340>)
How can I group by the list of actors?
You don't need pandas for this. You can use collections.Counter
.
from collections import Counter
Counter(actor for movie in movies for actor in movie.actors)