Search code examples
rggplot2ggridges

ggridges not showing all factors in legend


I'm trying to create a RidgePlot of this data:

long_data <- structure(list(Cell_Type = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 6L, 6L, 
6L, 6L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 10L, 
10L, 10L, 10L), .Label = c("CD4..T.cells", "CD4..Tcm", "CD8..Tcm", 
"CD8..Tem", "Class.switched.memory.B.cells", "Monocytes", "MPP", 
"naive.B.cells", "NK.cells", "Plasma.cells"), class = "factor"), 
    Sample = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 
    2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 
    1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 
    4L), .Label = c("Sample1", "Sample2", "Sample3", "Sample4"
    ), class = "factor"), Percentage = c(39, 35.1, 12.7405962, 
    4.995628, 11.4, 10.9, 8.9116355, 10.1955, 4.5, 6.1, 18.2643388, 
    7.005608, 5.9, 6, 4.9746425, 10.288099, 2.8, 2.8, 13.7408777, 
    19.657708, 18.8, 21.4, 5.7131852, 19.657708, 0.6, 0.8, 0.2308895, 
    18.295758, 7, 6.2, 7.210278, 3.383666, 9.9, 9.8, 16.9838566, 
    9.087761, 0.1, 1, 11.2297, 10.511573), outlier = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 
    1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 
    1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L), .Label = c("no", "yes"
    ), class = "factor")), class = "data.frame", row.names = c(NA, 
-40L))

column outlier is a factor of two levels "yes" and "no" but when I create the ggplot it is completely ignoring all the rows with "yes" for outlier.

Here's an example of the ggplot I'm getting.enter image description here I seem to be missing something here because the expected output is a legend with "no" and "yes" and points with two different colors.

This is the code to produce it:

ggplot(long_data, aes(x=Percentage, y=reorder(Cell_Type,Percentage))) + 
  geom_density_ridges(scale=2, alpha=0.2)+
  geom_density_ridges(alpha=0, color=NA, jittered_points=T, point_alpha=1, aes(point_color=outlier)) + 
  xlab("Percentage") + ylab("Cell Type") 

what am I missing here?


Solution

  • Although it is not written, geom_density_ridges somehow assumes the point_color will be synchronous with the fill, for example, if we use iris, you can see it becomes something you did not intend for:

    dat = iris
    dat$new = factor(sample(1:2,nrow(iris),replace=TRUE))
    ggplot(dat,aes(x=Sepal.Length, y=Species, fill = Species)) +
    geom_density_ridges(aes(point_color = new), alpha = .2, jittered_points = TRUE)
    

    enter image description here

    You can use the solution @markhogue proposed, if you don't need a legend and have only 2 colors.

    Try something like below:

    ggplot(long_data, aes(x=Percentage, y=reorder(Cell_Type,Percentage))) + 
    geom_density_ridges(scale=2, alpha=0.2)+
    geom_point(aes(col=outlier))
    

    enter image description here

    If you want to jitter:

    ggplot(long_data, aes(x=Percentage, y=reorder(Cell_Type,Percentage))) + 
    geom_density_ridges(scale=2, alpha=0.2)+
    geom_jitter(aes(col=outlier),position = position_jitter(height = 0.1))
    

    enter image description here