Search code examples
rdataframeggplot2plotdplyr

How to plot a scatter plot for values in each category?


I have a dataset containing precipitation and wind speed data. I've categorized the wind speed (max_ws) into five equal categories using the cut_number function and the rainfall into four categories: 0 mm, 0.01 to 2.50 mm, 2.51 to 5.00 mm, 5.01 to 7.50 mm, and >7.51 mm.

I would like to create a plot that shows the amount of precipitation for each combination of rain_category and ws_category. So x-axis would be rain_category, y axis would be ws_category and data points in the scatter plot would represent precipitation column.

Here is the reproducible example of my data:

df <- structure(list(max_ws = c(2.4, 1.1, 0, 2.9, 3.8, 4.1, 3.9, 3.8, 
                                2.6, 3.8, 4.2, 2.1, 2.9, 1.5, 2, 2.2, 3.1, 2.9, 3.1, 4.3, 4.1, 
                                4.7, 3.1, 2.7, 5.7, 5.8, 3.8, 2.9, 0.3, 1.6, 0.8, 0, 1.9, 1.2, 
                                4.3, 0.9, 2.4, 3.7, 4.8, 4.5, 3.5, 0, 2.3, 3.2, 3.2, 5, 3.3, 
                                3.6, 2.4, 2.8, 4.7, 5.3, 4.4, 1.6, 5.3, 5.5, 4.6, 2.7, 3.5, 2.5, 
                                2.3, 3.5, 4.7, 3.8, 4.4, 2.8, 5.4, 3.3, 4.7, 4, 3.3, 3.1, 2, 
                                1.7, 2.7, 3.2, 3, 4.6, 4, 3.6, 3.2, 4.5, 3.8, 4.1, 3.3, 2, 3.2, 
                                4.1, 4.3, 4.6, 4.5, 3.9, 3.1, 3.9, 4.6, 3.7, 3.4, 4.9, 3.2, 3.8, 
                                4.6, 4, 1.9, 2.4, 3.3, 4.4, 3.4, 5.1, 4.6, 4.9, 3.4, 4, 3.6, 
                                4.9, 4, 5.3, 5.6, 4.4, 5.5, 5.9, 5.8, 3.9, 5.1, 3.8, 3.3, 4.8, 
                                3.7, 3.6, 4.3, 3, 4.8, 5.6, 4.3, 3, 4.8, 2.7, 4.4, 2.5, 4.5, 
                                2.8, 3.4, 4.7, 4.1, 4.2, 4.5, 4.9, 4.5, 2.9, 3.2, 3, 1.6, 2.4, 
                                3.3, 2.8, 3, 1.9, 3, 3.8, 3.1, 4.9, 5.3, 3.6, 3.8, 3.8, 2.5, 
                                3.5, 3.8, 4.2, 4.9, 4, 3.9, 4, 3.9, 5.3, 4.5, 4.5, 4.8, 3.3, 
                                2.7, 3.3, 3.5, 3.9, 4.8, 3.3, 2.9, 3, 4.5, 4.2, 3.6, 5.5, 6, 
                                4.4, 4.6, 4.7, 2.9, 3.7, 2.5, 4.1, 3.2, 4.6, 4.7, 2.9, 2.9, 1.7, 
                                3.6, 3.1, 3.6, 4.1, 3.4, 2.8, 3.3, 4.2, 3, 3.3, 2.4, 3.6, 2.8, 
                                2.9, 4.3, 4, 3, 2, 2.3, 3.7, 3.8, 4.4, 4.3, 4.7, 3.5, 2.6, 3.9, 
                                3.5, 2.8, 2.4, 3.7, 3.2, 2.5, 4.8, 3.7, 3.4, 2.9, 3.4, 2.5, 4, 
                                2.2, 3.7, 2.6, 2.6, 2.3, 2.6, 3.1, 2.5, 3.1, 3.2, 3.9, 3.1, 2, 
                                4.7, 2.3, 3.7, 3.3, 3.7, 3, 4.1, 3.6, 2.5, 3.3, 5.6, 4.5, 3.3, 
                                3.6, 3.7, 4, 3.9, 4.2, 3.3, 4.5, 2.9, 6.2, 3, 3.7, 2.1, 3.2, 
                                1.9, 3.3, 4, 3.6, 4.3, 3.7, 5.2, 3.9, 3.7, 2.9, 2.4, 3.8, 3.2, 
                                3.1, 2.5, 2.8, 3.2, 3.8, 3.2, 4.6, 3.3, 4.2, 3.9, 4.4, 4.4, 3.6, 
                                3, 4, 3.4, 4.3, 3.5, 2.5, 3.7, 3.3, 3.3, 1.2, 1.9, 2.9, 3.4, 
                                1.4, 2.7, 3, 4.2, 5, 2, 3.7, 8, 5.7, 1.8, 3.3, 3.8, 2.7, 4.5, 
                                3.6, 4.2, 5.2, 4.1, 4.9, 4.1, 2.9, 4.8, 4.9, 3.7, 2.7, 2.8, 5.2, 
                                3.9, 3, 2.8, 1.4, 2.9, 5.9, 5.2, 4.2, 4.3, 6, 5.6, 4.1, 5.5, 
                                4.2, 4.9, 5.7, 5.8), precipitation = c(0.1, 0, 0, 0, 0, 0, 0, 
                                                                       0, 0, 0, 3, 0, 0, 0, 0, 0, 2.8, 0.6, 4.8, 9.8, 2.3, 0, 0, 0, 
                                                                       0.1, 2.3, 0.2, 0, 0, 0, 0, 0, 0, 0, 0.2, 0.1, 4.3, 10.4, 3, 5.6, 
                                                                       0, 0, 0, 0, 0.5, 3.3, 4.2, 2.4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                                                                       1.7, 0.1, 0, 0, 0, 2.5, 0.1, 0, 10, 0, 0.8, 0, 0, 0, 0, 0, 0, 
                                                                       0, 0.6, 0.3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.6, 0.4, 0.5, 0.5, 
                                                                       0.1, 0, 0, 0, 2.2, 1.9, 0, 8, 6, 0, 3.6, 0, 0, 0, 0.3, 0, 1, 
                                                                       1.1, 1.5, 1.1, 4.3, 0.9, 0.8, 0, 0.3, 2.7, 0.7, 0, 0, 0, 3.8, 
                                                                       0, 0.1, 0, 0.8, 0, 0.1, 12.1, 4.2, 0, 0, 0, 0, 3.1, 2.4, 0, 0.4, 
                                                                       0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.8, 19, 1, 0, 0, 3, 0, 4.8, 
                                                                       0.2, 2.9, 0.1, 1.6, 1.5, 0, 0, 0, 2, 5.3, 0, 6, 0, 0, 2.5, 0.4, 
                                                                       4.4, 20.7, 6.1, 3.4, 2.8, 0, 0.2, 3.7, 0.6, 0, 0, 0, 4.2, 0, 
                                                                       0, 7.3, 10.3, 1, 4.3, 0.2, 4.2, 0.7, 4, 7.7, 3.1, 19.1, 2.6, 
                                                                       0.9, 0, 0, 0, 0, 0, 0, 11.2, 0.6, 1.9, 4.4, 0, 0, 0.4, 0.6, 0, 
                                                                       5.4, 2.6, 3.4, 5, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0.4, 0, 0, 13.9, 
                                                                       0, 0.1, 2, 1.9, 3.3, 1.5, 0, 0, 0, 5.5, 0, 0, 0, 0, 0, 0, 0, 
                                                                       0, 0, 0.1, 4.5, 0.9, 0.2, 3.9, 0, 0, 0, 0.7, 2, 0, 6.7, 1.4, 
                                                                       8.8, 10.9, 2, 3.8, 10.1, 0.1, 0, 0, 3.3, 0, 5.2, 1.9, 24.9, 2, 
                                                                       1.9, 0.1, 0.9, 0, 0, 10.5, 3.4, 0.2, 1.1, 2.1, 0.5, 0, 0, 0, 
                                                                       0, 0, 5.4, 0.8, 0.2, 0, 0, 0.3, 7.1, 0.2, 0.1, 3.9, 1.7, 3.2, 
                                                                       3.6, 0.4, 4.8, 0.3, 1, 0.9, 1.1, 0, 0, 0, 0, 0, 0, 2.3, 1, 0, 
                                                                       0, 0, 0, 0, 2.2, 0.1, 1.7, 0.3, 0, 0.7, 0, 1.9, 0.1, 3.2, 1.9, 
                                                                       1.4, 0, 0, 7.3, 8.7, 1.2, 5, 2.2, 0, 8.6, 3.7, 2.3, 5.1, 0.2, 
                                                                       0, 0, 3.5, 22, 1, 8.7, 2.6, 3.5, 0.2, 0.7, 0.9, 6.3, 7.8), ws_category = structure(c(1L, 
                                                                                                                                                            1L, 1L, 2L, 3L, 4L, 4L, 3L, 2L, 3L, 4L, 1L, 2L, 1L, 1L, 1L, 2L, 
                                                                                                                                                            2L, 2L, 4L, 4L, 5L, 2L, 2L, 5L, 5L, 3L, 2L, 1L, 1L, 1L, 1L, 1L, 
                                                                                                                                                            1L, 4L, 1L, 1L, 3L, 5L, 4L, 3L, 1L, 1L, 2L, 2L, 5L, 3L, 3L, 1L, 
                                                                                                                                                            2L, 5L, 5L, 4L, 1L, 5L, 5L, 4L, 2L, 3L, 1L, 1L, 3L, 5L, 3L, 4L, 
                                                                                                                                                            2L, 5L, 3L, 5L, 4L, 3L, 2L, 1L, 1L, 2L, 2L, 2L, 4L, 4L, 3L, 2L, 
                                                                                                                                                            4L, 3L, 4L, 3L, 1L, 2L, 4L, 4L, 4L, 4L, 4L, 2L, 4L, 4L, 3L, 3L, 
                                                                                                                                                            5L, 2L, 3L, 4L, 4L, 1L, 1L, 3L, 4L, 3L, 5L, 4L, 5L, 3L, 4L, 3L, 
                                                                                                                                                            5L, 4L, 5L, 5L, 4L, 5L, 5L, 5L, 4L, 5L, 3L, 3L, 5L, 3L, 3L, 4L, 
                                                                                                                                                            2L, 5L, 5L, 4L, 2L, 5L, 2L, 4L, 1L, 4L, 2L, 3L, 5L, 4L, 4L, 4L, 
                                                                                                                                                            5L, 4L, 2L, 2L, 2L, 1L, 1L, 3L, 2L, 2L, 1L, 2L, 3L, 2L, 5L, 5L, 
                                                                                                                                                            3L, 3L, 3L, 1L, 3L, 3L, 4L, 5L, 4L, 4L, 4L, 4L, 5L, 4L, 4L, 5L, 
                                                                                                                                                            3L, 2L, 3L, 3L, 4L, 5L, 3L, 2L, 2L, 4L, 4L, 3L, 5L, 5L, 4L, 4L, 
                                                                                                                                                            5L, 2L, 3L, 1L, 4L, 2L, 4L, 5L, 2L, 2L, 1L, 3L, 2L, 3L, 4L, 3L, 
                                                                                                                                                            2L, 3L, 4L, 2L, 3L, 1L, 3L, 2L, 2L, 4L, 4L, 2L, 1L, 1L, 3L, 3L, 
                                                                                                                                                            4L, 4L, 5L, 3L, 2L, 4L, 3L, 2L, 1L, 3L, 2L, 1L, 5L, 3L, 3L, 2L, 
                                                                                                                                                            3L, 1L, 4L, 1L, 3L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 4L, 2L, 1L, 
                                                                                                                                                            5L, 1L, 3L, 3L, 3L, 2L, 4L, 3L, 1L, 3L, 5L, 4L, 3L, 3L, 3L, 4L, 
                                                                                                                                                            4L, 4L, 3L, 4L, 2L, 5L, 2L, 3L, 1L, 2L, 1L, 3L, 4L, 3L, 4L, 3L, 
                                                                                                                                                            5L, 4L, 3L, 2L, 1L, 3L, 2L, 2L, 1L, 2L, 2L, 3L, 2L, 4L, 3L, 4L, 
                                                                                                                                                            4L, 4L, 4L, 3L, 2L, 4L, 3L, 4L, 3L, 1L, 3L, 3L, 3L, 1L, 1L, 2L, 
                                                                                                                                                            3L, 1L, 2L, 2L, 4L, 5L, 1L, 3L, 5L, 5L, 1L, 3L, 3L, 2L, 4L, 3L, 
                                                                                                                                                            4L, 5L, 4L, 5L, 4L, 2L, 5L, 5L, 3L, 2L, 2L, 5L, 4L, 2L, 2L, 1L, 
                                                                                                                                                            2L, 5L, 5L, 4L, 4L, 5L, 5L, 4L, 5L, 4L, 5L, 5L, 5L), levels = c("[0,2.5]", 
                                                                                                                                                                                                                            "(2.5,3.2]", "(3.2,3.8]", "(3.8,4.6]", "(4.6,11.6]"), class = "factor"), 
                     rain_category = structure(c(3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
                                                 2L, 2L, 4L, 2L, 2L, 2L, 2L, 2L, 4L, 3L, 4L, 1L, 3L, 2L, 2L, 
                                                 2L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 4L, 1L, 
                                                 4L, 5L, 2L, 2L, 2L, 2L, 3L, 4L, 4L, 3L, 2L, 2L, 2L, 2L, 2L, 
                                                 2L, 2L, 2L, 2L, 2L, 3L, 3L, 2L, 2L, 2L, 3L, 3L, 2L, 1L, 2L, 
                                                 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 
                                                 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 3L, 3L, 
                                                 2L, 1L, 5L, 2L, 4L, 2L, 2L, 2L, 3L, 2L, 3L, 3L, 3L, 3L, 4L, 
                                                 3L, 3L, 2L, 3L, 4L, 3L, 2L, 2L, 2L, 4L, 2L, 3L, 2L, 3L, 2L, 
                                                 3L, 1L, 4L, 2L, 2L, 2L, 2L, 4L, 3L, 2L, 3L, 2L, 3L, 2L, 2L, 
                                                 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 1L, 3L, 2L, 2L, 4L, 2L, 4L, 
                                                 3L, 4L, 3L, 3L, 3L, 2L, 2L, 2L, 3L, 5L, 2L, 5L, 2L, 2L, 3L, 
                                                 3L, 4L, 1L, 5L, 4L, 4L, 2L, 3L, 4L, 3L, 2L, 2L, 2L, 4L, 2L, 
                                                 2L, 5L, 1L, 3L, 4L, 3L, 4L, 3L, 4L, 1L, 4L, 1L, 4L, 3L, 2L, 
                                                 2L, 2L, 2L, 2L, 2L, 1L, 3L, 3L, 4L, 2L, 2L, 3L, 3L, 2L, 5L, 
                                                 4L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 2L, 3L, 2L, 2L, 
                                                 1L, 2L, 3L, 3L, 3L, 4L, 3L, 2L, 2L, 2L, 5L, 2L, 2L, 2L, 2L, 
                                                 2L, 2L, 2L, 2L, 2L, 3L, 4L, 3L, 3L, 4L, 2L, 2L, 2L, 3L, 3L, 
                                                 2L, 5L, 3L, 1L, 1L, 3L, 4L, 1L, 3L, 2L, 2L, 4L, 2L, 5L, 3L, 
                                                 1L, 3L, 3L, 3L, 3L, 2L, 2L, 1L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 
                                                 2L, 2L, 2L, 5L, 3L, 3L, 2L, 2L, 3L, 5L, 3L, 3L, 4L, 3L, 4L, 
                                                 4L, 3L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 
                                                 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 2L, 3L, 2L, 3L, 3L, 4L, 
                                                 3L, 3L, 2L, 2L, 5L, 1L, 3L, 4L, 3L, 2L, 1L, 4L, 3L, 5L, 3L, 
                                                 2L, 2L, 4L, 1L, 3L, 1L, 4L, 4L, 3L, 3L, 3L, 5L, 1L), levels = c(">7.50", 
                                                                                                                 "0", "0.01 to 2.50", "2.51 to 5.00", "5.01 to 7.50"), class = "factor")), row.names = c(NA, 
                                                                                                                                                                                                         -366L), class = c("tbl_df", "tbl", "data.frame"))

When I try to plot, I get only one data point in each category, but there are many data points in my data. What step am I missing?

p <- ggplot(df, aes(x = rain_category, y = ws_category, fill = precipitation)) +
  geom_point(size = 3) +
  theme_minimal() 
p

enter image description here


Solution

  • By binning your data this way, all points are overlapping at each intersection of ws_category and rain_category. The whiteboard sketch resembles continuous data. If you want to keep the data labeled, but give a sense of the number of points in each bin on the grid, you can add random noise. This might lead to incorrect conclusions if it gives someone the impression that a point more to the right, within a bin, has a greater value.

    Edit Following a question raised in the comments, added a plot of summarized data at the end.

    library(tidyverse)
    
    ggplot(df, aes(x = rain_category, y = ws_category, color = precipitation)) +
      geom_point(size = 1,
                 position = position_jitterdodge(dodge.width = 0.25,
                                                 jitter.height = 0.25)) +
      theme_minimal() 
    

    Consider faceting, as suggested by @MrFlick in a comment. Each pair of categories is plotted on it's own, allowing you to plot points relative to each other using the actual measurements.

    ggplot(df, aes(precipitation, max_ws)) +
      geom_point() +
      facet_grid(ws_category ~ rain_category,
                 scales = 'free')
    

    Summarize precipitation data for a plot

    Before creating a plot, this calculates the average precipitation and counts the number of stations in each pairing of ws_category and rain_category. fct_relevel() is being used to change the order of the levels, putting >7.50 at the end.

    library(tidyverse)
    
    df %>%
      summarise(avg_precipitation = mean(precipitation),
                n = n(),
                .by = c(ws_category, rain_category)) %>%
      ggplot(aes(x = fct_relevel(rain_category, ">7.50", after = Inf), y = ws_category, label = n, fill = avg_precipitation)) +
      geom_tile() +
      geom_text(size = 10) +
      labs(x = 'Rain Category',
           y = 'WS Category',
           fill = 'Average Precipitation')
    

    Created on 2023-10-07 with reprex v2.0.2