Search code examples
pandasdataframeseaborndata-analysis

Building Heatmap with two separate series having "Year" and "Month" information


I am working on a dataset

d = {'date_added_month': ['February', 'December', 'October', 'December', 'April','December', 'March', 'April'], 
     'date_added_year': [2014, 2012, 2008, 2009, 2010, 2011, 2012, 2013],
    'title': ['apple', 'ball', 'cat', 'dog', 'elephant', 'fish', 'goat', 'horse'],
    'titles_count': [0,0,0,0,0,0,0,0]}
df = pd.DataFrame(data=d)

Name of Titles segregated by Months and Years

I want to build a heatmap with years on X-axis and Months on Y-axis and count the number of titles on a particular month and year. How do I count the number of titles month and year wise?

I have counted the titles in both Month and Year basis, like this:

grp_by_yr = df.groupby("date_added_year").size()
grp_by_mn =  df.groupby("date_added_month").size()

But I am not sure how to aggregate both this information.


Solution

  • Just fill the titles_count with 1 first, since they denote 1 count per row.

    release_dist_df['titles_count'] = 1
    

    Then pivot the table like so -

    heatmap1_data = pd.pivot_table(release_dist_df, values='titles_count', 
                         index=['date_added_month'], 
                         columns='date_added_year')
    

    Then plot using seaborn -

    sns.heatmap(heatmap1_data, cmap="YlGnBu")
    

    Update

    Update with grouping as requested

    import pandas as pd
    
    d = {'date_added_month': ['February', 'February', 'December', 'October', 'December', 'April','December', 'March', 'April'],
         'date_added_year': [2014, 2014, 2012, 2008, 2009, 2010, 2011, 2012, 2013],
        'title': ['apple', 'apple-new', 'ball', 'cat', 'dog', 'elephant', 'fish', 'goat', 'horse'],
        'titles_count': [0,0,0,0,0,0,0,0,0]}
    df = pd.DataFrame(data=d)
    df['titles_count'] = 1
    
    group_by_both = df.groupby(["date_added_year", "date_added_month"]).agg({'titles_count': 'sum'})
    
    heatmap1_data = pd.pivot_table(group_by_both, values='titles_count',
                         index=['date_added_month'],
                         columns='date_added_year')
    print(heatmap1_data)
    
    import seaborn as sns
    sns_plot = sns.heatmap(heatmap1_data, cmap="YlGnBu")
    

    I also added one more data point to show that aggregation is working (2014 February). heatmap