Search code examples
rggplot2scaleggridges

GGRidges: single density plot too high


I have a data frame which is made of one single dependent variable and multiple independent variables. Groups of relationships were taken from different tests (but this is not important to solve the problem that I met).

I'm using the package ggridges to represent one graph with density plots for each relationship. When there is only one value for one relationship, ggridges produce a point instead of a density plot. My issue is that in that case the density plot below overlaps with the space above. Probably because ggridges does not see another density plot and expands the space that the density plot below can take. The option "scale" can be used to avoid the overlap of two density plots but not to avoid the overlap between a density plot and a point (at least I think so).

If I set scale = 0.5, I can solve the problem but this is not the best thing to do because each density plot becomes smaller. Also those ones that do not overlap with others.

Below I attached a reproducible example that produces a graph with the problem I met. Thanks to anyone could help me.

library(magrittr)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(forcats)
library(ggplot2)
library(ggridges)
library(RCurl)
#> Carico il pacchetto richiesto: bitops

y_ex <- c("y1","y1","y1","y1","y1","y1","y1","y1","y1","y1","y1","y1","y1","y1","y1","y1","y1","y1","y1","y1","y1","y1","y1")
x_ex <- c("x1","x1","x2","x3","x3","x3","x3","x3","x3","x4","x4","x4","x5","x5","x5","x5","x5","x5","x5","x5","x5","x5","x5")
value_ex <- c(0.26,0.40,0.47,0.72,0.71,0.69,0.74,0.73,0.24,0.39,0.43,0.46,0.21,0.18,0.14,0.10,0.16,-0.10,-0.11,0.56,0.50,0.49,0.43)

data_ex <- data.frame(y_ex,x_ex,value_ex)

r_ex <- data_ex %>% 
  dplyr::mutate(x_ex = forcats::fct_reorder(x_ex, desc(value_ex), fun = mean))

r_ex %>% 
  ggplot(aes(x = value_ex, y = x_ex)) + 
  ggtitle(paste0("Predictors of ",y_ex)) +
  geom_density_ridges(fill = "royalblue",
                      scale = 0.9,
                      color = NA, 
                      alpha = 0.7,
                      rel_min_height = 0.01) +
  geom_point(size = 0.5, alpha = 0.5, pch = 16) +
  geom_point(data = r_ex %>% group_by(x_ex) %>% dplyr::summarise(value_ex = mean(value_ex)),
             color = "firebrick",
             pch = 16,
             alpha = 0.5) +
  scale_y_discrete("") +
  scale_x_continuous("", limits = c(0, 1)) +
  theme_grey(base_size = 16, base_family = "serif") +
  theme(plot.title = element_text(hjust = 0.5,
                                  lineheight = .8, 
                                  face = "bold",
                                  margin = margin(10, 0, 20, 0),
                                  color = "gray15"),
        legend.position = "none") 
#> Picking joint bandwidth of 0.0453
#> Warning: Removed 2 rows containing non-finite values (stat_density_ridges).
#> Warning: Removed 2 rows containing missing values (geom_point).`

enter image description here


Solution

  • What is happening is that the scaling heuristic is getting confused by the missing data. The scaling heuristic takes the total range of baseline y values and divides it by the number of groups - 1, see here.

    In your case, the scaling heuristic produces a reference scale that is exactly a factor 2 too large, and hence you should use scale = 0.45 to get the same effect you would be getting with scale = 0.9 if there weren't any missing levels.

    Note that all areas need to be scaled together, because the areas under the distributions need to be of the same size (1 in some units). Your x5 distributions are not as tall because they are bimodal and hence much wider.