When converting a column to a type categorical, and setting the some aesthetics property (aes()) to use it, I'm getting the following error:
NotImplementedError: isna is not defined for MultiIndex
For example, here's a reproducible example:
randCat = np.random.randint(0,2,500)
randProj = np.random.rand(1,500)
df = pd.DataFrame({'proj': np.ravel(randProj),'cat': np.ravel(randCat)})
df['cat'] = df['cat'].map({0:'firstCat', 1:'secondCat'})
df['cat'] = df['cat'].astype('category')
g = ggplot(aes(x='proj', color='cat',fill='cat'), data=df) + geom_density(alpha=0.7)
print(g)
I'm using pandas version 0.22.0
.
And ggplot 0.11.5
Interestingly enough, the plot comes out fine when I'm not setting the "cond" column to be a "categorical" type (remains as string). However, for different purposes I need this column to categorical.
A more complete trace of the error:
54 # hack (for now) because MI registers as ndarray
55 elif isinstance(obj, ABCMultiIndex):
---> 56 raise NotImplementedError("isna is not defined for MultiIndex")
57 elif isinstance(obj, (ABCSeries, np.ndarray, ABCIndexClass)):
58 return _isna_ndarraylike(obj)
NotImplementedError: isna is not defined for MultiIndex
Thanks, Eyal.
It's probably an edge case that causes ggplot in combination with pandas to fail.
Looking at the source code of ggplot, we find at the end of ggploy.py: _construct_plot_data
:
groups = [column for _, column in discrete_aes]
if groups:
return mappers, data.groupby(groups)
else:
return mappers, [(0, data)]
So my guess is that the category is used for the groupby, which causes pandas to break.
Try casting to object
instead of category
and in the case of geom_density
remove the fill='cat'
as this causes the lines and legend to be rendered twice:
randCat = np.random.randint(0,2,500)
randProj = np.random.rand(1,500)
df = pd.DataFrame({'proj': np.ravel(randProj),'cat': np.ravel(randCat)})
df['cat'] = df['cat'].map({0:'firstCat', 1:'secondCat'})
df['cat'] = df['cat'].astype('object')
g = ggplot(aes(x='proj', color='cat'), data=df) + geom_density(alpha=0.7)
print(g)
See also http://ggplot.yhathq.com/how-it-works.html and http://ggplot.yhathq.com/docs/geom_density.html