Search code examples
pythonpandasmatplotlibplotimdb

pandas: How to plot the pie diagram for the movie counts versus genre of IMDB movies in pandas?


I have the following dataset:

import pandas as pd
import numpy as np 
%matplotlib inline

df = pd.DataFrame({'movie' : ['A', 'B','C','D'], 
                   'genres': ['Science Fiction|Romance|Family', 'Action|Romance',
                              'Family|Drama','Mystery|Science Fiction|Drama']},
                  index=range(4))
df

My attempt

# Parse unique genre from all the movies
gen = []
for g in df['genres']:
    gg = g.split('|')
    gen = gen + gg
    gen = list(set(gen))

print(gen)

df['genres'].value_counts().plot(kind='pie')

I got this image: enter image description here

But I would like to pie chart for each separate genres.

How we get the genres for number count of movies for each unique genres?


Solution

  • You can do .str.split() with expand=True, which will give you a DataFrame of all the genres. If you then stack that, you will get the value counts for all of the genres.

    df.genres.str.split('|', expand=True).stack().value_counts().plot(kind='pie', label='Genre')
    

    enter image description here

    That can be a bit on the slower side to calculate the counts, so a faster implementation for the same plot would be (adding the percentages):

    from itertools import chain
    from collections import Counter
    import matplotlib.pyplot as plt
    
    cts = Counter(chain.from_iterable(df.genres.str.split('|').values))
    _ = plt.pie(cts.values(), labels=cts.keys(), autopct='%1.0f%%')
    _ = plt.ylabel('Genres')
    

    enter image description here