I have a pandas Data Frame of three columns 'principal component 1', 'principal component 2' and 'Class'. The Class column consists of either 1(Pass) or 0(Fail) value. I plotted a 2D graph of principal component 1 vs 2, but I want to choose the color of Pass and Fail scatter.
Here F2 is a DataFrame consisting of 'principal component 1', 'principal component 2' and 'Class' columns.
targets = [1, 0]
Class = ['Pass', 'Fail']
colors = ['k', 'r']
fig = plt.figure(figsize=(8, 8))
ax = fig.add_subplot(1, 1, 1)
ax.set_xlabel('Principal Component 1', fontsize=15)
ax.set_ylabel('Principal Component 2', fontsize=15)
ax.set_title('2 component PCA', fontsize=20)
for target, color in zip(targets, colors):
indicesToKeep = F2['Class'] == target
sns.scatterplot(F2.loc[indicesToKeep, 'principal component 1'],
F2.loc[indicesToKeep, 'principal component 2'], c=color, alpha=0.85)
ax.legend(Class)
ax.grid()
plt.show()
If I include c=color in scatterplot function to set, 'Pass' as black and 'Fail' as red it gives the following error
ValueError: 'c' argument must be a color, a sequence of colors, or a sequence of numbers, not array('k', dtype='<U1')
and if I remove that seaborn chooses its default coloring.
Also, I want to set the seaborn plot background to be a grid but sns.set_style("darkgrid")
is not working.
Without input data, it's difficult to help you but perhaps you can try something like:
params = {1: {'color': 'k', 'label': 'Pass'},
0: {'color': 'r', 'label': 'Fail'}}
fig = plt.figure(figsize=(8, 8))
ax = fig.add_subplot(1, 1, 1)
for klass, subdf in df.groupby('Class'):
sns.scatterplot(subdf, x='principal component 1', y='principal component 2',
ax=ax, **params[klass])
ax.legend()
plt.show()
Output: