Search code examples
pythonmatplotlibplotdistribution

How to plot distributions for several bivariate groups of variable using Python


I am analysing data which is organised as following:

  • There are 4 different pandas data fram for each groups (A, B and C).
  • Each dataframe representing a group has 4 subroups (columns) and rows representing thoer corresponding observations.

For example, a single group of data looks like:

subgroup-1 subgroup-2 subgroup-3 subgroup-4
12 4 NaN 9
15 3 4 NaN
16 8 3 11
17 12 8 13
11 17 12 14

I want to visualise the distributions for each subgroup for the different group. Can anyone let me know what are the available options in Python to do this (the chart types I can use). Thanks.

I tried using histogram, density plots but all of them work only for 2 variables.


Solution

  • import pandas as pd  
    import numpy as np
    import matplotlib.pyplot as plt
    
    
    # pandas Dataframes
    group_A = pd.DataFrame(np.random.rand(50, 4) , columns=['subgroup-1' , 'subgroup-2' , 'subgroup-3' , 'subgroup-4'])  
    group_B = pd.DataFrame(np.random.rand(50, 4) , columns=['subgroup-1' , 'subgroup-2' , 'subgroup-3' , 'subgroup-4'])  
    group_C = pd.DataFrame(np.random.rand(50, 4) , columns=['subgroup-1' , 'subgroup-2' , 'subgroup-3' , 'subgroup-4'])  
    
      
    
    
    
    def plot_hist(subgroup):
        np.random.seed(19680801)
    
        n_bins = 10
    
        x = np.dstack([group_A[subgroup] , group_B[subgroup] , group_C[subgroup]])[0]
    
    
        fig, axes = plt.subplots(nrows=2, ncols=2)
        ax0, ax1, ax2, ax3 = axes.flatten()
    
        ax0.hist(x, n_bins, density=True, histtype='bar', label = ['A', 'B', 'C'])
        ax0.legend(prop={'size': 10})
        ax0.set_title('bars with legend')
    
        ax1.hist(x, n_bins, density=True, histtype='bar', stacked=True)
        ax1.set_title('stacked bar')
    
        ax2.hist(x, n_bins, histtype='step', stacked=True, fill=False)
        ax2.set_title('stack step (unfilled)')
    
        # Make a multiple-histogram of data-sets with different length.
        x_multi = [np.random.randn(n) for n in [10000, 5000, 2000]]
        ax3.hist(x_multi, n_bins, histtype='bar')
        ax3.set_title('different sample sizes')
    
        fig.tight_layout()
        plt.show()
    
    
    
    
    
    
    plot_hist('subgroup-1')
    

    enter image description here