Search code examples
pythondataframeplotseabornhistogram

Plot distribution of pandas dataframe depending on target value


I want to visualize the grade depending on the sex (male/female).

My dataframe:

df = pd.DataFrame(
 {
 "key": ["K0", "K1", "K2", "K3", "K4", "K5", "K6", "K7", "K8", "K9"],
 "grade": [1.0, 2.0, 4.0, 1.0, 5.0, 2.0, 3.0, 1.0, 6.0, 3.0],
 "sex": [1, 0, 0, 1, 0,1,0,1,0,0] 
 }
)


    key grade   sex
0   K0   1.0     1
1   K1   2.0     0
2   K2   4.0     0
3   K3   1.0     1
4   K4   5.0     0
5   K5   2.0     1
6   K6   3.0     0
7   K7   1.0     1
8   K8   6.0     0
9   K9   3.0     0

My approach was to use a histogram and plot the distribution. However, I don't know how to visualize the distribution depending on the target. There are some examples in Seaborn Documentation, but I failed to apply it to my specific problem.

All I have is this:

plt.hist(df['grade'], bins=10, edgecolor='black');
plt.xlabel('grade');
plt.ylabel('count');

enter image description here


Solution

  • You can do this in matplotlib:

    import matplotlib.pyplot as pyplot
    
    x=df.loc[df['sex']==1, 'grade']
    y=df.loc[df['sex']==0, 'grade']
    
    bins=list(range(6))
    
    pyplot.hist(x, bins, alpha=0.5, label='sex=1')
    pyplot.hist(y, bins, alpha=0.5, label='sex=2')
    pyplot.legend(loc='upper right')
    pyplot.show()