I have a dataframe (nb - the data is dummy data and does not represent what is in the plots):
Index BGC frequency - Count Proportion of total BGCs both captured and not captured by antiSMASH - %
species_a 1 2
species_b 3 4
... ... ...
I want to make a scatter plot of BGC frequency - Count
vs Proportion of total BGCs both captured and not captured by antiSMASH - %
, with points coloured according to the categorical Index
, and a legend.
import matplotlib.pyplot as plt
from matplotlib import colors
import pandas as pd
colorlist = list(colors.ColorConverter.colors.keys())
captured_df.plot.scatter(x='BGC frequency - Count',
y= 'Proportion of total BGCs both captured and not captured by antiSMASH - %' ,
c = colorlist,
title = 'BGCs with an antiSMASH region')
Gets me close:
But I cant get a legend. Ideally I'd want something like what is shown here, line 69:
But when i tried:
df.plot.scatter(x='BGC frequency - Count', y='Proportion of total BGCs both captured and not captured by antiSMASH - %', c=df.index, cmap="viridis", s=50)
I get:
ValueError: 'c' argument must be a mpl color, a sequence of mpl colors or a sequence of numbers, not Index(...list of index species names...)
I'm not sure why this is - I thought cmap
converts the c
data into a list of the correct data type? The link above is explicitly dealing with categorical data -
If a categorical column is passed to c, then a discrete colorbar will be produced
Also please note I dont want a numerical color bar - this would not be much use:
Thanks for reading :D
The trick is to convert the "type" column to categorical (in your case the Index
column).
For example:
d = pd.DataFrame([["a", 1,3], ["b", 3,3], ["b", 2,3], ["a", 5,2]], columns=['type', 'x', 'y'])
d['type'] = pd.Categorical(d['type'])
d.plot.scatter(x='x', y='y', c='type', cmap='inferno')
plt.show()
This should work.
Also it is worth mentioning that this feature is from Pandas version 1.3.0 (July 2. 2021)!
Make sure that you use the appropriate version.