I have a dataset containing precipitation and wind speed data. I've categorized the wind speed (max_ws
) into five equal categories using the cut_number
function and the rainfall into four categories: 0 mm, 0.01 to 2.50 mm, 2.51 to 5.00 mm, 5.01 to 7.50 mm, and >7.51 mm.
I would like to create a plot that shows the amount of precipitation for each combination of rain_category
and ws_category
. So x-axis would be rain_category
, y axis would be ws_category
and data points in the scatter plot would represent precipitation
column.
Here is the reproducible example of my data:
df <- structure(list(max_ws = c(2.4, 1.1, 0, 2.9, 3.8, 4.1, 3.9, 3.8,
2.6, 3.8, 4.2, 2.1, 2.9, 1.5, 2, 2.2, 3.1, 2.9, 3.1, 4.3, 4.1,
4.7, 3.1, 2.7, 5.7, 5.8, 3.8, 2.9, 0.3, 1.6, 0.8, 0, 1.9, 1.2,
4.3, 0.9, 2.4, 3.7, 4.8, 4.5, 3.5, 0, 2.3, 3.2, 3.2, 5, 3.3,
3.6, 2.4, 2.8, 4.7, 5.3, 4.4, 1.6, 5.3, 5.5, 4.6, 2.7, 3.5, 2.5,
2.3, 3.5, 4.7, 3.8, 4.4, 2.8, 5.4, 3.3, 4.7, 4, 3.3, 3.1, 2,
1.7, 2.7, 3.2, 3, 4.6, 4, 3.6, 3.2, 4.5, 3.8, 4.1, 3.3, 2, 3.2,
4.1, 4.3, 4.6, 4.5, 3.9, 3.1, 3.9, 4.6, 3.7, 3.4, 4.9, 3.2, 3.8,
4.6, 4, 1.9, 2.4, 3.3, 4.4, 3.4, 5.1, 4.6, 4.9, 3.4, 4, 3.6,
4.9, 4, 5.3, 5.6, 4.4, 5.5, 5.9, 5.8, 3.9, 5.1, 3.8, 3.3, 4.8,
3.7, 3.6, 4.3, 3, 4.8, 5.6, 4.3, 3, 4.8, 2.7, 4.4, 2.5, 4.5,
2.8, 3.4, 4.7, 4.1, 4.2, 4.5, 4.9, 4.5, 2.9, 3.2, 3, 1.6, 2.4,
3.3, 2.8, 3, 1.9, 3, 3.8, 3.1, 4.9, 5.3, 3.6, 3.8, 3.8, 2.5,
3.5, 3.8, 4.2, 4.9, 4, 3.9, 4, 3.9, 5.3, 4.5, 4.5, 4.8, 3.3,
2.7, 3.3, 3.5, 3.9, 4.8, 3.3, 2.9, 3, 4.5, 4.2, 3.6, 5.5, 6,
4.4, 4.6, 4.7, 2.9, 3.7, 2.5, 4.1, 3.2, 4.6, 4.7, 2.9, 2.9, 1.7,
3.6, 3.1, 3.6, 4.1, 3.4, 2.8, 3.3, 4.2, 3, 3.3, 2.4, 3.6, 2.8,
2.9, 4.3, 4, 3, 2, 2.3, 3.7, 3.8, 4.4, 4.3, 4.7, 3.5, 2.6, 3.9,
3.5, 2.8, 2.4, 3.7, 3.2, 2.5, 4.8, 3.7, 3.4, 2.9, 3.4, 2.5, 4,
2.2, 3.7, 2.6, 2.6, 2.3, 2.6, 3.1, 2.5, 3.1, 3.2, 3.9, 3.1, 2,
4.7, 2.3, 3.7, 3.3, 3.7, 3, 4.1, 3.6, 2.5, 3.3, 5.6, 4.5, 3.3,
3.6, 3.7, 4, 3.9, 4.2, 3.3, 4.5, 2.9, 6.2, 3, 3.7, 2.1, 3.2,
1.9, 3.3, 4, 3.6, 4.3, 3.7, 5.2, 3.9, 3.7, 2.9, 2.4, 3.8, 3.2,
3.1, 2.5, 2.8, 3.2, 3.8, 3.2, 4.6, 3.3, 4.2, 3.9, 4.4, 4.4, 3.6,
3, 4, 3.4, 4.3, 3.5, 2.5, 3.7, 3.3, 3.3, 1.2, 1.9, 2.9, 3.4,
1.4, 2.7, 3, 4.2, 5, 2, 3.7, 8, 5.7, 1.8, 3.3, 3.8, 2.7, 4.5,
3.6, 4.2, 5.2, 4.1, 4.9, 4.1, 2.9, 4.8, 4.9, 3.7, 2.7, 2.8, 5.2,
3.9, 3, 2.8, 1.4, 2.9, 5.9, 5.2, 4.2, 4.3, 6, 5.6, 4.1, 5.5,
4.2, 4.9, 5.7, 5.8), precipitation = c(0.1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 3, 0, 0, 0, 0, 0, 2.8, 0.6, 4.8, 9.8, 2.3, 0, 0, 0,
0.1, 2.3, 0.2, 0, 0, 0, 0, 0, 0, 0, 0.2, 0.1, 4.3, 10.4, 3, 5.6,
0, 0, 0, 0, 0.5, 3.3, 4.2, 2.4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1.7, 0.1, 0, 0, 0, 2.5, 0.1, 0, 10, 0, 0.8, 0, 0, 0, 0, 0, 0,
0, 0.6, 0.3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.6, 0.4, 0.5, 0.5,
0.1, 0, 0, 0, 2.2, 1.9, 0, 8, 6, 0, 3.6, 0, 0, 0, 0.3, 0, 1,
1.1, 1.5, 1.1, 4.3, 0.9, 0.8, 0, 0.3, 2.7, 0.7, 0, 0, 0, 3.8,
0, 0.1, 0, 0.8, 0, 0.1, 12.1, 4.2, 0, 0, 0, 0, 3.1, 2.4, 0, 0.4,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.8, 19, 1, 0, 0, 3, 0, 4.8,
0.2, 2.9, 0.1, 1.6, 1.5, 0, 0, 0, 2, 5.3, 0, 6, 0, 0, 2.5, 0.4,
4.4, 20.7, 6.1, 3.4, 2.8, 0, 0.2, 3.7, 0.6, 0, 0, 0, 4.2, 0,
0, 7.3, 10.3, 1, 4.3, 0.2, 4.2, 0.7, 4, 7.7, 3.1, 19.1, 2.6,
0.9, 0, 0, 0, 0, 0, 0, 11.2, 0.6, 1.9, 4.4, 0, 0, 0.4, 0.6, 0,
5.4, 2.6, 3.4, 5, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0.4, 0, 0, 13.9,
0, 0.1, 2, 1.9, 3.3, 1.5, 0, 0, 0, 5.5, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0.1, 4.5, 0.9, 0.2, 3.9, 0, 0, 0, 0.7, 2, 0, 6.7, 1.4,
8.8, 10.9, 2, 3.8, 10.1, 0.1, 0, 0, 3.3, 0, 5.2, 1.9, 24.9, 2,
1.9, 0.1, 0.9, 0, 0, 10.5, 3.4, 0.2, 1.1, 2.1, 0.5, 0, 0, 0,
0, 0, 5.4, 0.8, 0.2, 0, 0, 0.3, 7.1, 0.2, 0.1, 3.9, 1.7, 3.2,
3.6, 0.4, 4.8, 0.3, 1, 0.9, 1.1, 0, 0, 0, 0, 0, 0, 2.3, 1, 0,
0, 0, 0, 0, 2.2, 0.1, 1.7, 0.3, 0, 0.7, 0, 1.9, 0.1, 3.2, 1.9,
1.4, 0, 0, 7.3, 8.7, 1.2, 5, 2.2, 0, 8.6, 3.7, 2.3, 5.1, 0.2,
0, 0, 3.5, 22, 1, 8.7, 2.6, 3.5, 0.2, 0.7, 0.9, 6.3, 7.8), ws_category = structure(c(1L,
1L, 1L, 2L, 3L, 4L, 4L, 3L, 2L, 3L, 4L, 1L, 2L, 1L, 1L, 1L, 2L,
2L, 2L, 4L, 4L, 5L, 2L, 2L, 5L, 5L, 3L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 4L, 1L, 1L, 3L, 5L, 4L, 3L, 1L, 1L, 2L, 2L, 5L, 3L, 3L, 1L,
2L, 5L, 5L, 4L, 1L, 5L, 5L, 4L, 2L, 3L, 1L, 1L, 3L, 5L, 3L, 4L,
2L, 5L, 3L, 5L, 4L, 3L, 2L, 1L, 1L, 2L, 2L, 2L, 4L, 4L, 3L, 2L,
4L, 3L, 4L, 3L, 1L, 2L, 4L, 4L, 4L, 4L, 4L, 2L, 4L, 4L, 3L, 3L,
5L, 2L, 3L, 4L, 4L, 1L, 1L, 3L, 4L, 3L, 5L, 4L, 5L, 3L, 4L, 3L,
5L, 4L, 5L, 5L, 4L, 5L, 5L, 5L, 4L, 5L, 3L, 3L, 5L, 3L, 3L, 4L,
2L, 5L, 5L, 4L, 2L, 5L, 2L, 4L, 1L, 4L, 2L, 3L, 5L, 4L, 4L, 4L,
5L, 4L, 2L, 2L, 2L, 1L, 1L, 3L, 2L, 2L, 1L, 2L, 3L, 2L, 5L, 5L,
3L, 3L, 3L, 1L, 3L, 3L, 4L, 5L, 4L, 4L, 4L, 4L, 5L, 4L, 4L, 5L,
3L, 2L, 3L, 3L, 4L, 5L, 3L, 2L, 2L, 4L, 4L, 3L, 5L, 5L, 4L, 4L,
5L, 2L, 3L, 1L, 4L, 2L, 4L, 5L, 2L, 2L, 1L, 3L, 2L, 3L, 4L, 3L,
2L, 3L, 4L, 2L, 3L, 1L, 3L, 2L, 2L, 4L, 4L, 2L, 1L, 1L, 3L, 3L,
4L, 4L, 5L, 3L, 2L, 4L, 3L, 2L, 1L, 3L, 2L, 1L, 5L, 3L, 3L, 2L,
3L, 1L, 4L, 1L, 3L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 4L, 2L, 1L,
5L, 1L, 3L, 3L, 3L, 2L, 4L, 3L, 1L, 3L, 5L, 4L, 3L, 3L, 3L, 4L,
4L, 4L, 3L, 4L, 2L, 5L, 2L, 3L, 1L, 2L, 1L, 3L, 4L, 3L, 4L, 3L,
5L, 4L, 3L, 2L, 1L, 3L, 2L, 2L, 1L, 2L, 2L, 3L, 2L, 4L, 3L, 4L,
4L, 4L, 4L, 3L, 2L, 4L, 3L, 4L, 3L, 1L, 3L, 3L, 3L, 1L, 1L, 2L,
3L, 1L, 2L, 2L, 4L, 5L, 1L, 3L, 5L, 5L, 1L, 3L, 3L, 2L, 4L, 3L,
4L, 5L, 4L, 5L, 4L, 2L, 5L, 5L, 3L, 2L, 2L, 5L, 4L, 2L, 2L, 1L,
2L, 5L, 5L, 4L, 4L, 5L, 5L, 4L, 5L, 4L, 5L, 5L, 5L), levels = c("[0,2.5]",
"(2.5,3.2]", "(3.2,3.8]", "(3.8,4.6]", "(4.6,11.6]"), class = "factor"),
rain_category = structure(c(3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 4L, 2L, 2L, 2L, 2L, 2L, 4L, 3L, 4L, 1L, 3L, 2L, 2L,
2L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 4L, 1L,
4L, 5L, 2L, 2L, 2L, 2L, 3L, 4L, 4L, 3L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 2L, 2L, 2L, 3L, 3L, 2L, 1L, 2L,
3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 3L, 3L,
2L, 1L, 5L, 2L, 4L, 2L, 2L, 2L, 3L, 2L, 3L, 3L, 3L, 3L, 4L,
3L, 3L, 2L, 3L, 4L, 3L, 2L, 2L, 2L, 4L, 2L, 3L, 2L, 3L, 2L,
3L, 1L, 4L, 2L, 2L, 2L, 2L, 4L, 3L, 2L, 3L, 2L, 3L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 1L, 3L, 2L, 2L, 4L, 2L, 4L,
3L, 4L, 3L, 3L, 3L, 2L, 2L, 2L, 3L, 5L, 2L, 5L, 2L, 2L, 3L,
3L, 4L, 1L, 5L, 4L, 4L, 2L, 3L, 4L, 3L, 2L, 2L, 2L, 4L, 2L,
2L, 5L, 1L, 3L, 4L, 3L, 4L, 3L, 4L, 1L, 4L, 1L, 4L, 3L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 3L, 3L, 4L, 2L, 2L, 3L, 3L, 2L, 5L,
4L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 2L, 3L, 2L, 2L,
1L, 2L, 3L, 3L, 3L, 4L, 3L, 2L, 2L, 2L, 5L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 3L, 4L, 3L, 3L, 4L, 2L, 2L, 2L, 3L, 3L,
2L, 5L, 3L, 1L, 1L, 3L, 4L, 1L, 3L, 2L, 2L, 4L, 2L, 5L, 3L,
1L, 3L, 3L, 3L, 3L, 2L, 2L, 1L, 4L, 3L, 3L, 3L, 3L, 2L, 2L,
2L, 2L, 2L, 5L, 3L, 3L, 2L, 2L, 3L, 5L, 3L, 3L, 4L, 3L, 4L,
4L, 3L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 2L, 3L, 2L, 3L, 3L, 4L,
3L, 3L, 2L, 2L, 5L, 1L, 3L, 4L, 3L, 2L, 1L, 4L, 3L, 5L, 3L,
2L, 2L, 4L, 1L, 3L, 1L, 4L, 4L, 3L, 3L, 3L, 5L, 1L), levels = c(">7.50",
"0", "0.01 to 2.50", "2.51 to 5.00", "5.01 to 7.50"), class = "factor")), row.names = c(NA,
-366L), class = c("tbl_df", "tbl", "data.frame"))
When I try to plot, I get only one data point in each category, but there are many data points in my data. What step am I missing?
p <- ggplot(df, aes(x = rain_category, y = ws_category, fill = precipitation)) +
geom_point(size = 3) +
theme_minimal()
p
By binning your data this way, all points are overlapping at each intersection of ws_category
and rain_category
. The whiteboard sketch resembles continuous data. If you want to keep the data labeled, but give a sense of the number of points in each bin on the grid, you can add random noise. This might lead to incorrect conclusions if it gives someone the impression that a point more to the right, within a bin, has a greater value.
Edit Following a question raised in the comments, added a plot of summarized data at the end.
library(tidyverse)
ggplot(df, aes(x = rain_category, y = ws_category, color = precipitation)) +
geom_point(size = 1,
position = position_jitterdodge(dodge.width = 0.25,
jitter.height = 0.25)) +
theme_minimal()
Consider faceting, as suggested by @MrFlick in a comment. Each pair of categories is plotted on it's own, allowing you to plot points relative to each other using the actual measurements.
ggplot(df, aes(precipitation, max_ws)) +
geom_point() +
facet_grid(ws_category ~ rain_category,
scales = 'free')
Before creating a plot, this calculates the average precipitation and counts the number of stations in each pairing of ws_category
and rain_category
. fct_relevel()
is being used to change the order of the levels, putting >7.50
at the end.
library(tidyverse)
df %>%
summarise(avg_precipitation = mean(precipitation),
n = n(),
.by = c(ws_category, rain_category)) %>%
ggplot(aes(x = fct_relevel(rain_category, ">7.50", after = Inf), y = ws_category, label = n, fill = avg_precipitation)) +
geom_tile() +
geom_text(size = 10) +
labs(x = 'Rain Category',
y = 'WS Category',
fill = 'Average Precipitation')
Created on 2023-10-07 with reprex v2.0.2