I'm currently working on a COVID-19 Germany Shiny App for an University Project. I'm trying to make a barplot that shows the daily infection numbers of different regional levels of Germany. This is not a specific problem with Shiny App, it's more ggplot. I reproduced the problem without the Shiny App enviornment. My basic Code is the following:
require(tidyverse)
library(tidyverse)
require(lubridate)
library(lubridate)
library(readr)
require(zoo)
library(zoo)
data <- read_csv("https://opendata.arcgis.com/datasets/dd4580c810204019a7b8eb3e0b329dd6_0.csv")
## Data Coding data Datensatz
data$Meldedatum <- ymd_hms(data$Meldedatum)
data$Meldedatum <- date(data$Meldedatum)
# Label Deutschland
data$label_de <- paste("Deutschland")
# Label Deutschland - Alter
data$label_de_age <- paste(data$label_de, data$Altersgruppe)
# Label Bundesland Alter
data$label_bl_age <- paste(data$Bundesland, data$Altersgruppe)
# Label SK/LK Alter
data$label_sklk_age <- paste(data$Landkreis, data$Altersgruppe)
#Data Long
data_long <- data[c( "Meldedatum", "AnzahlFall","Bundesland", "Landkreis" ,"label_de_age", "label_bl_age", "label_sklk_age")]
data_long$Deutschland <- "Deutschland"
data_long<- pivot_longer(data_long, -c( Meldedatum, AnzahlFall), values_to = "Gebiet")
data_long<- data_long[c("Meldedatum", "AnzahlFall", "Gebiet")]
The specific new labels for the column data_long$Gebiet
are important for my shiny App.
Now if I want to plot the daily infection numbers of e.g. "Deutschland" (Germany) and "Bayern" (Bavaria) without position = "dodge"
, my graph looks like the following, which is fine at first.
# Plot Deutschland and Bayern
ggplot(data = subset(data_long, Gebiet %in% c("Deutschland", "Bayern" )),
mapping = aes(
x= Meldedatum,
y= AnzahlFall,
fill = Gebiet
) )+
geom_bar(stat = "identity")
But if I now add the line position = "dodge"
to geom_bar()
my plot breaks. And looks like the following.
# Plot Deutschland and Bayern with dodge
ggplot(data = subset(data_long, Gebiet %in% c("Deutschland", "Bayern" )),
mapping = aes(
x= Meldedatum,
y= AnzahlFall,
fill = Gebiet
) )+
geom_bar(stat = "identity", position = "dodge")
Does somebody know why this happens and how to fix this?
Thanks for the help.
The issue is that you have multiple observations per date. Therefore you get multiple bars per date (and of course region) when using position="dodge"
. To solve this issue aggregate your data by date and region before plotting, e.g. by using count(Meldedatum, Gebiet, wt = AnzahlFall)
which will add a new variable (named n
by default) to your df with the sum of cases per date and region:
library(tidyverse)
library(lubridate)
library(readr)
library(zoo)
data <- read_csv("https://opendata.arcgis.com/datasets/dd4580c810204019a7b8eb3e0b329dd6_0.csv")
## Data Coding data Datensatz
data$Meldedatum <- ymd_hms(data$Meldedatum)
data$Meldedatum <- date(data$Meldedatum)
# Label Deutschland
data$label_de <- paste("Deutschland")
# Label Deutschland - Alter
data$label_de_age <- paste(data$label_de, data$Altersgruppe)
# Label Bundesland Alter
data$label_bl_age <- paste(data$Bundesland, data$Altersgruppe)
# Label SK/LK Alter
data$label_sklk_age <- paste(data$Landkreis, data$Altersgruppe)
#Data Long
data_long <- data[c( "Meldedatum", "AnzahlFall","Bundesland", "Landkreis" ,"label_de_age", "label_bl_age", "label_sklk_age")]
data_long$Deutschland <- "Deutschland"
data_long<- pivot_longer(data_long, -c( Meldedatum, AnzahlFall), values_to = "Gebiet")
data_long<- data_long[c("Meldedatum", "AnzahlFall", "Gebiet")]
data_long %>%
count(Meldedatum, Gebiet, wt = AnzahlFall) %>%
filter(Gebiet %in% c("Deutschland", "Bayern")) %>%
ggplot(mapping = aes(
x= Meldedatum,
y= n,
fill = Gebiet
))+
geom_bar(stat = "identity", position = "dodge")