Search code examples
pythonseabornscatter

Seaborn scatterplot can't get hue_order to work


I have a Seaborn scatterplot and am trying to control the plotting order with 'hue_order', but it is not working as I would have expected (I can't get the blue dot to show on top of the gray).

x = [1, 2, 3, 1, 2, 3]
cat = ['N','Y','N','N','N']
test = pd.DataFrame(list(zip(x,cat)), 
                  columns =['x','cat']
                 )
display(test)

colors = {'N': 'gray', 'Y': 'blue'}
sns.scatterplot(data=test, x='x', y='x', 
                hue='cat', hue_order=['Y', 'N', ],
                palette=colors,
               )

enter image description here

Flipping the 'hue_order' to hue_order=['N', 'Y', ] doesn't change the plot. How can I get the 'Y' category to plot on top of the 'N' category? My actual data has duplicate x,y ordinates that are differentiated by the category column.


Solution

  • The reason this is happening is that, unlike most plotting functions, scatterplot doesn't (internally) iterate over the hue levels when it's constructing the plot. It draws a single scatterplot and then sets the color of the elements with a vector. It does this so that you don't end up with all of the points from the final hue level on top of all the points from the penultimate hue level on top of all the ... etc. But it means that the scatterplot z-ordering is insensitive to the hue ordering and reflects only the order in the input data.

    So you could use your desired hue order to sort the input data:

    hue_order = ["N", "Y"]
    colors = {'N': 'gray', 'Y': 'blue'}
    sns.scatterplot(
        data=test.sort_values('cat', key=np.vectorize(hue_order.index)),
        x='x', y='x',
        hue='cat', hue_order=hue_order,
        palette=colors, s=100,  # Embiggen the points to see what's happening
    )
    

    enter image description here

    There may be a more efficient way to do that "sort by list of unique values" built into pandas; I am not sure.