Search code examples
pythonpython-3.xpandaspython-ggplot

Error using categorical column in geom_density


When converting a column to a type categorical, and setting the some aesthetics property (aes()) to use it, I'm getting the following error:

NotImplementedError: isna is not defined for MultiIndex

For example, here's a reproducible example:

randCat = np.random.randint(0,2,500)
randProj = np.random.rand(1,500)
df = pd.DataFrame({'proj': np.ravel(randProj),'cat': np.ravel(randCat)})
df['cat'] = df['cat'].map({0:'firstCat', 1:'secondCat'}) 


df['cat'] = df['cat'].astype('category')
g = ggplot(aes(x='proj', color='cat',fill='cat'), data=df) + geom_density(alpha=0.7)
print(g)

I'm using pandas version 0.22.0. And ggplot 0.11.5

Interestingly enough, the plot comes out fine when I'm not setting the "cond" column to be a "categorical" type (remains as string). However, for different purposes I need this column to categorical.

A more complete trace of the error:

     54     # hack (for now) because MI registers as ndarray
     55     elif isinstance(obj, ABCMultiIndex):
---> 56         raise NotImplementedError("isna is not defined for MultiIndex")
     57     elif isinstance(obj, (ABCSeries, np.ndarray, ABCIndexClass)):
     58         return _isna_ndarraylike(obj)

NotImplementedError: isna is not defined for MultiIndex

Thanks, Eyal.


Solution

  • It's probably an edge case that causes ggplot in combination with pandas to fail.

    Looking at the source code of ggplot, we find at the end of ggploy.py: _construct_plot_data:

    groups = [column for _, column in discrete_aes]
    if groups:
        return mappers, data.groupby(groups)
    else:
        return mappers, [(0, data)]
    

    So my guess is that the category is used for the groupby, which causes pandas to break.

    Try casting to object instead of category and in the case of geom_density remove the fill='cat' as this causes the lines and legend to be rendered twice:

    randCat = np.random.randint(0,2,500)
    randProj = np.random.rand(1,500)
    df = pd.DataFrame({'proj': np.ravel(randProj),'cat': np.ravel(randCat)})
    df['cat'] = df['cat'].map({0:'firstCat', 1:'secondCat'}) 
    df['cat'] = df['cat'].astype('object')
    
    g = ggplot(aes(x='proj', color='cat'), data=df) + geom_density(alpha=0.7)
    print(g)
    

    See also http://ggplot.yhathq.com/how-it-works.html and http://ggplot.yhathq.com/docs/geom_density.html