Search code examples
rggplot2tidyversemultiple-axes

using y-axis values to create secondary x-axis in ggplot2


I would like to create a dot plot with percentiles, which looks something like this-

enter image description here

Here is the ggplot2 code I used to create the dot plot. There are two things I'd like to change:

  1. I can plot the percentile values on the y-axis but I want these values on the x-axis (as shown in the graph above). Note that the coordinates are flipped.
  2. The axes don't display label for the minimum value (for example the percentile axis labels start at 25 when they should start at 0 instead.)
# loading needed libraries
library(tidyverse)
library(ggstatsplot)

# creating dataframe with mean mileage per manufacturer
cty_mpg <- ggplot2::mpg %>%
  dplyr::group_by(.data = ., manufacturer) %>%
  dplyr::summarise(.data = ., mileage = mean(cty, na.rm = TRUE)) %>%
  dplyr::rename(.data = ., make = manufacturer) %>%
  dplyr::arrange(.data = ., mileage) %>%
  dplyr::mutate(.data = ., make = factor(x = make, levels = .$make)) %>%
  dplyr::mutate(
    .data = .,
    percent_rank = (trunc(rank(mileage)) / length(mileage)) * 100
  ) %>%
  tibble::as_data_frame(x = .)

# plot
ggplot2::ggplot(data = cty_mpg, mapping = ggplot2::aes(x = make, y = mileage)) +
  ggplot2::geom_point(col = "tomato2", size = 3) + # Draw points
  ggplot2::geom_segment(
    mapping = ggplot2::aes(
      x = make,
      xend = make,
      y = min(mileage),
      yend = max(mileage)
    ),
    linetype = "dashed",
    size = 0.1
  ) + # Draw dashed lines
  ggplot2::scale_y_continuous(sec.axis = ggplot2::sec_axis(trans = ~(trunc(rank(.)) / length(.)) * 100, name = "percentile")) +
  ggplot2::coord_flip() +
  ggplot2::labs(
    title = "City mileage by car manufacturer",
    subtitle = "Dot plot",
    caption = "source: mpg dataset in ggplot2"
  ) +
  ggstatsplot::theme_ggstatsplot()

Created on 2018-08-17 by the reprex package (v0.2.0.9000).


Solution

  • I am not 100% sure to have understood what you really want, but below is my attempt to reproduce the first picture with mpg data:

    require(ggplot2)
    
    data <- aggregate(cty~manufacturer, mpg, FUN = mean)
    data <- data.frame(data[order(data$cty), ], rank=1:nrow(data))
    
    g <- ggplot(data, aes(y = rank, x = cty))
    g <- g + geom_point(size = 2)
    g <- g + scale_y_continuous(name = "Manufacturer", labels = data$manufacturer, breaks = data$rank,
                                sec.axis = dup_axis(name = element_blank(),
                                                    breaks = seq(1, nrow(data), (nrow(data)-1)/4),
                                                    labels = 25 * 0:4))
    g <- g + scale_x_continuous(name = "Mileage", limits = c(10, 25),
                                sec.axis = dup_axis(name = element_blank()))
    g <- g + theme_classic()
    g <- g + theme(panel.grid.major.y = element_line(color = "black", linetype = "dotted"))
    
    print(g)
    

    That produces:

    enter image description here

    data <- aggregate(cty~manufacturer, mpg, FUN = mean)
    data <- data.frame(data[order(data$cty), ], rank=1:nrow(data))
    

    These two lines generate the data for the graph. Basically we need the manufacturers, the mileage (average of cty by manufacturer) and the rank.

    g <- g + scale_y_continuous(name = "Manufacturer", labels = data$manufacturer, breaks = data$rank,
                                sec.axis = dup_axis(name = element_blank(),
                                                    breaks = seq(1, nrow(data), (nrow(data)-1)/4),
                                                    labels = 25 * 0:4))
    

    Note that here the scale is using rank and not the column manufacturer. To display the name of the manufacturers, you must use the labels property and you must force the breaks to be for every values (see property breaks).

    The second y-axis is generated using the sec.axis property. This is very straight-forward using dup_axis that easily duplicate the axis. By replacing the labels and the breaks, you can display the %-value.

    g <- g + theme(panel.grid.major.y = element_line(color = "black", linetype = "dotted"))
    

    The horizontal lines are just the major grid. This is much easier to manipulate than geom_segments in my opinion.

    Regarding your question 1, you can flip the coordinates easily using coord_flip, with minor adjustments. Replace the following line:

    g <- g + theme(panel.grid.major.y = element_line(color = "black", linetype = "dotted")
    

    By the following two lines:

    g <- g + coord_flip()
    g <- g + theme(panel.grid.major.x = element_line(color = "black", linetype = "dotted"),
                   axis.text.x = element_text(angle = 90, hjust = 1))
    

    Which produces:

    enter image description here

    Regarding your question 2, the problem is that the value 0% is outside the limits. You can solve this issue by changing the way you calculate the percentage (starting from zero and not from one), or you can extend the limit of your plot to include the value zero, but then no point will be associated to 0%.