Search code examples
pythonggplot2plotnine

How to add legend in plotnine for multiple curves when "tidy data" is not the issue


Several people have asked how to add a legend in ggplot2 or plotnine for multiple curves when the curves differ by the selection of rows to be plotted. The typical answer is to reformat the data as tidy data. Some examples are here, here, and here.

I need multiple lines not because of subsetting the data, but rather because I want to compare smoothing methods. The data are the same for all the lines, so the above answers don't help.

The latter two answers point out that in ggplot2 in R, the legend can be created by moving the color specifier inside aes(...). This is described in detail here, which is similar to what I want to do.

Is this supposed to work in plotnine as well? I tried an example similar to the previous link. It works fine without the legend:

from plotnine import *
from plotnine.data import *

(ggplot(faithful, aes(x='waiting'))
    + geom_line(stat='density', adjust=0.5, color='red')
    + geom_line(stat='density', color='blue')
    + geom_line(stat='density', adjust=2, color='green')
    + labs(title='Effect of varying KDE smoothing parameter',
           x='Time to next eruption (min)',
           y='Density')
)

Graph without legend

But it fails when I move color into aes in order to get a legend:

from plotnine import *
from plotnine.data import *

(ggplot(faithful, aes(x='waiting'))
    + geom_line(aes(color='red'), stat='density', adjust=0.5)
    + geom_line(aes(color='blue'), stat='density')
    + geom_line(aes(color='green'), stat='density', adjust=2)
    + labs(title='Effect of varying KDE smoothing parameter',
           x='Time to next eruption (min)',
           y='Density')
    + scale_color_identity(guide='legend')
)

This give the error PlotnineError: "Could not evaluate the 'color' mapping: 'red' (original error: name 'red' is not defined)".

Any suggestions for how to add a legend? Thanks.


Solution

  • It looks like the last link you posted was on the right track but you have to trick python to overcome some of the non-standard evaluation that R does. I was able to get it to work by setting two sets of quotes around the color names:

    (ggplot(faithful, aes(x='waiting'))
        + geom_line(aes(color="'red'"), stat='density', adjust=0.5)
        + geom_line(aes(color="'blue'"), stat='density')
        + geom_line(aes(color="'green'"), stat='density', adjust=2)
        + labs(title='Effect of ...',
               x='Time to next eruption (min)',
               y='Density')
        + scale_color_identity(guide='legend',name='My color legend')
    )
    

    1

    And you can make your own labels like the post:

    (ggplot(faithful,aes(x='waiting'))
     + geom_line(aes(color="'red'"),stat='density',adjust=.5)
     + geom_line(aes(color="'blue'"),stat='density')
     + geom_line(aes(color="'green'"), stat='density',adjust=2)
     +labs(title='Effect of ...',x='Time to next eruption (min)',
           y='Density')
     + scale_color_identity(guide='legend',name='My colors',
                            breaks=['red','blue','green'],
                            labels=['Label 1','Label 2','Label 3']))
    

    2