I am working on the Tidy Tuesday data this week and ran into my geom_area doing what I think is overlapping the data. If I facet_wrap the data then there are no missing values in any year, but as soon as I make an area plot and fill it the healthcare/education data seems to disappear.
Below are example plots of what I mean.
library(tidyverse)
chain_investment <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-08-10/chain_investment.csv')
plottable_investment <- chain_investment %>%
filter(group_num == c(12,17)) %>%
mutate(small_cat = case_when(
group_num == 12 ~ "Transportation",
group_num == 17 ~ "Education/Health"
)) %>%
group_by(small_cat, year, category) %>%
summarise(sum(gross_inv_chain)) %>%
ungroup %>%
rename(gross_inv_chain = 4)
# This plot shows that there is NO missing education, health, or highway data
# Goal is to combine the data on one plot and fill based on the category
plottable_investment %>%
ggplot(aes(year, gross_inv_chain)) +
geom_area() +
facet_wrap(~category)
# Some of the data in the health category gets lost? disappears? unknown
plottable_investment %>%
ggplot(aes(year, gross_inv_chain, fill = category)) +
geom_area()
# Something is going wrong here?
plottable_investment %>%
filter(category %in% c("Education","Health")) %>%
ggplot(aes(year, gross_inv_chain, fill = category)) +
geom_area(position = "identity")
# The data is definitely there
plottable_investment %>%
filter(category %in% c("Education","Health")) %>%
ggplot(aes(year, gross_inv_chain)) +
geom_area() +
facet_wrap(~category)
The issue is that you filtered your data using ==
instead of using %in%
.
In your case using ==
has the subtle side effect that for some categories (e.g. Health) your filtered data contains only obs for even years, while for others (e.g. Education) we end up with obs for only uneven years. As a result you end up with "two" area charts which overlap each other.
This could be easily seen by switching to geom_col
which gives you a "dodged" bar plot as we have only one category per year.
plottable_investment %>%
filter(category %in% c("Education","Health")) %>%
ggplot(aes(year, gross_inv_chain, fill = category)) +
geom_col()
Using %in%
instead gives the desired stacked area chart with all observations per category:
plottable_investment1 <- chain_investment %>%
filter(group_num %in% c(12,17)) %>%
mutate(small_cat = case_when(
group_num == 12 ~ "Transportation",
group_num == 17 ~ "Education/Health"
)) %>%
group_by(small_cat, year, category) %>%
summarise(gross_inv_chain = sum(gross_inv_chain)) %>%
ungroup()
#> `summarise()` has grouped output by 'small_cat', 'year'. You can override using the `.groups` argument.
plottable_investment1 %>%
filter(category %in% c("Education","Health")) %>%
ggplot(aes(year, gross_inv_chain, fill = category)) +
geom_area()