Search code examples

R ggplot: Size breaks for continuous variable

I am trying to create a scatterplot with different point sizes for a continuous variable R95p_e. However, I'd like to bin or create breaks for the continuous variable according to:

iris <- iris %>% mutate(discrete_var2 = cut(R95p_e, c(-0.5,0.06,0.08,0.1,0.12,0.14,0.16,0.18,0.2,3))))

Creating bins however didn't work for me since it gave me Error: Continuous value supplied to discrete scale (also with as.numeric(R95p_e) and/or as.factor(R95p_e)), so instead I tried this:

iris %>% 
  ggplot(aes(CDD_d, R95p_d, color = discrete_var, size = R95p_e)) +
  geom_quadrant_lines(linetype = "solid") +
  geom_point(alpha = 0.5) +
  scale_color_manual(values = col) +
  scale_size_continuous(breaks = c(1,2,3,4,5,6,7,8)) +
  scale_x_continuous(limits = symmetric_limits) +
  scale_y_continuous(limits = symmetric_limits) +
  theme_minimal() + 
  geom_text(aes(label = ifelse(CDD_e > quantile(CDD_e,probs = .95,na.rm=TRUE) | R95p_e > quantile(R95p_e,probs = .95,na.rm=TRUE), as.character(NAME), ""), size = 0.1),show_guide=F) +
  labs(size="Trade-weighted exposure to extreme precipitation", colour="Trade-weighted exposure to extreme drought")

but then I end up with single-sized points: enter image description here

Reproducible example:

iris <- structure(list(NAME = c("Afghanistan", "Albania", "Algeria", 
"American Samoa", "Andorra", "Angola", "Anguilla", "Antarctica", 
"Antigua and Barbuda", "Argentina"), CDD_d = c(-0.0409, 0.0349, 
0.092, NA, 0.0493, -0.087, NA, NA, NA, -0.0199), R95p_d = c(0.2219, 
0.0564, 0.1601, NA, 0.0669, 1.9369, NA, NA, NA, 0.124), CDD_e = c(0.0163842166664955, 
0.0420785596173385, 0.0301859613207384, NA, NA, 0.0132801284765419, 
NA, NA, 0.01446, 0.0979854033169376), R95p_e = c(0.200947581296687, 
0.105155138501437, 0.123916523483283, NA, NA, 0.185846581581744, 
NA, NA, 0.01664, 0.146239802993381), discrete_var = structure(c(5L, 
7L, 6L, NA, NA, 5L, NA, NA, 5L, 9L), levels = c("— -6", "-6 to -4", 
"-4 to -2", "-2 to 0", "0 to 2", "2 to 4", "4 to 6", "6 to 8", 
"8 — "), class = "factor"), discrete_var2 = structure(c(9L, 
4L, 5L, NA, NA, 8L, NA, NA, 1L, 6L), levels = c(" — -6", "6 to 8", 
"8 to 10", "10 to 12", "12 to 14", "14 to 16", "16 to 18", "18 to 20", 
"20 —"), class = "factor")), row.names = c(NA, 10L), class = "data.frame")


  • If you want to bin your variable then you have to use a discrete scale, e.g. scale_size_manual:

    p <- iris |> 
      ggplot(aes(CDD_d, R95p_d, color = discrete_var)) +
      geom_quadrant_lines(linetype = "solid") +
      geom_point(alpha = 0.5) +
      scale_x_continuous(limits = symmetric_limits) +
      scale_y_continuous(limits = symmetric_limits) +
      theme_minimal() +
        aes(label = ifelse(
          CDD_e > quantile(CDD_e, probs = .95, na.rm = TRUE) |
            R95p_e > quantile(R95p_e, probs = .95, na.rm = TRUE),
          as.character(NAME), ""
        show.legend = FALSE
      ) +
        size = "Trade-weighted exposure to extreme precipitation",
        colour = "Trade-weighted exposure to extreme drought"
    p +
      aes(size = discrete_var2) +
      scale_size_manual(values = 1:9)
    #> Warning: Removed 5 rows containing missing values (`geom_point()`).
    #> Warning: Removed 5 rows containing missing values (`geom_text()`).

    And if you want to stick with the continuous variable then drop breaks= and instead set the range for the sizes via range=:

    p +
      aes(size = R95p_e) +
      scale_size_continuous(range = c(1, 9))
    #> Warning: Removed 5 rows containing missing values (`geom_point()`).
    #> Removed 5 rows containing missing values (`geom_text()`).