Search code examples
pythonseabornhistogram

Seaborn stacked histogram with data from multiple columns


How can I can plot multiple stacked histograms using Seaborn? I tried the following code, but it threw a dimensions error: ValueError: Length of list vectors must match length of data...

df = pd.DataFrame({'id': [1,2,3,4,5,6,7,8,9,10],
                   'val1': ['a','b',np.nan,np.nan,'a','a',np.nan,np.nan,np.nan,'b'],
                   'val2': [7,0.2,5,8,np.nan,1,0,np.nan,1,1],
                   'cat': ['yes','no','no','no','yes','yes','yes','yes','no','yes'],
                  })
display(df)

sns.histplot(data=df, y=['val1', 'val2'], hue='cat', multiple='stack')

enter image description here

Desired Plot:
enter image description here
val1 "no" freq = 1 and "yes" = 4
val2 "no" freq = 4 and "yes" = 4


Solution

  • I don't think you'll be able to do this directly from your current data frame. You need to get a dataframe that has val1/val2 in one column and yes/no in another.

    import numpy as np
    import pandas as pd
    import seaborn as sns
    import matplotlib.pyplot as plt
    
    df = pd.DataFrame({'id': [1,2,3,4,5,6,7,8,9,10],
                       'val1': ['a','b',np.nan,np.nan,'a','a',np.nan,np.nan,np.nan,'b'],
                       'val2': [7,0.2,5,8,np.nan,1,0,np.nan,1,1],
                       'cat': ['yes','no','no','no','yes','yes','yes','yes','no','yes'],
                      })
    
    val1 = df[['cat', 'val1']].dropna().drop(columns='val1')
    val1['val'] = 'val1'
    
    val2 = df[['cat', 'val2']].dropna().drop(columns='val2')
    val2['val'] = 'val2'
    
    plot_df = val1.append(val2).sort_values(by='cat')
    
    sns.histplot(data=plot_df,x='val', stat='count', hue='cat', multiple='stack')
    
    plt.show()