Search code examples
rggplot2geom-bar

ggplot2 barplot breaks when position = "dodge" used


I'm currently working on a COVID-19 Germany Shiny App for an University Project. I'm trying to make a barplot that shows the daily infection numbers of different regional levels of Germany. This is not a specific problem with Shiny App, it's more ggplot. I reproduced the problem without the Shiny App enviornment. My basic Code is the following:

require(tidyverse)
library(tidyverse)
require(lubridate)
library(lubridate)
library(readr)
require(zoo)
library(zoo)


data <- read_csv("https://opendata.arcgis.com/datasets/dd4580c810204019a7b8eb3e0b329dd6_0.csv")

## Data Coding data Datensatz
data$Meldedatum <- ymd_hms(data$Meldedatum)
data$Meldedatum <- date(data$Meldedatum)

# Label Deutschland

data$label_de <- paste("Deutschland")

# Label Deutschland - Alter

data$label_de_age <- paste(data$label_de, data$Altersgruppe)

# Label Bundesland Alter

data$label_bl_age <- paste(data$Bundesland, data$Altersgruppe)

# Label SK/LK Alter

data$label_sklk_age <- paste(data$Landkreis, data$Altersgruppe)

#Data Long
data_long <- data[c( "Meldedatum", "AnzahlFall","Bundesland", "Landkreis" ,"label_de_age", "label_bl_age", "label_sklk_age")]
data_long$Deutschland <- "Deutschland"

data_long<- pivot_longer(data_long, -c( Meldedatum, AnzahlFall), values_to = "Gebiet")

data_long<- data_long[c("Meldedatum", "AnzahlFall", "Gebiet")] 

The specific new labels for the column data_long$Gebiet are important for my shiny App. Now if I want to plot the daily infection numbers of e.g. "Deutschland" (Germany) and "Bayern" (Bavaria) without position = "dodge", my graph looks like the following, which is fine at first.

# Plot Deutschland and Bayern
ggplot(data =  subset(data_long, Gebiet %in% c("Deutschland", "Bayern" )), 
       mapping = aes(
         x= Meldedatum,
         y= AnzahlFall,
         fill = Gebiet
       ) )+
  geom_bar(stat = "identity")

Plot1 without dodge

But if I now add the line position = "dodge" to geom_bar() my plot breaks. And looks like the following.

# Plot Deutschland and Bayern with dodge
ggplot(data =  subset(data_long, Gebiet %in% c("Deutschland", "Bayern" )), 
       mapping = aes(
         x= Meldedatum,
         y= AnzahlFall,
         fill = Gebiet
       ) )+
  geom_bar(stat = "identity", position = "dodge")

Plot2 with dodge

Does somebody know why this happens and how to fix this?

Thanks for the help.


Solution

  • The issue is that you have multiple observations per date. Therefore you get multiple bars per date (and of course region) when using position="dodge". To solve this issue aggregate your data by date and region before plotting, e.g. by using count(Meldedatum, Gebiet, wt = AnzahlFall) which will add a new variable (named n by default) to your df with the sum of cases per date and region:

    library(tidyverse)
    library(lubridate)
    library(readr)
    library(zoo)
    
    
    data <- read_csv("https://opendata.arcgis.com/datasets/dd4580c810204019a7b8eb3e0b329dd6_0.csv")
    
    ## Data Coding data Datensatz
    data$Meldedatum <- ymd_hms(data$Meldedatum)
    data$Meldedatum <- date(data$Meldedatum)
    
    # Label Deutschland
    
    data$label_de <- paste("Deutschland")
    
    # Label Deutschland - Alter
    
    data$label_de_age <- paste(data$label_de, data$Altersgruppe)
    
    # Label Bundesland Alter
    
    data$label_bl_age <- paste(data$Bundesland, data$Altersgruppe)
    
    # Label SK/LK Alter
    
    data$label_sklk_age <- paste(data$Landkreis, data$Altersgruppe)
    
    #Data Long
    data_long <- data[c( "Meldedatum", "AnzahlFall","Bundesland", "Landkreis" ,"label_de_age", "label_bl_age", "label_sklk_age")]
    data_long$Deutschland <- "Deutschland"
    
    data_long<- pivot_longer(data_long, -c( Meldedatum, AnzahlFall), values_to = "Gebiet")
    
    data_long<- data_long[c("Meldedatum", "AnzahlFall", "Gebiet")] 
    
    data_long %>% 
      count(Meldedatum, Gebiet, wt = AnzahlFall) %>% 
      filter(Gebiet %in% c("Deutschland", "Bayern")) %>% 
      ggplot(mapping = aes(
             x= Meldedatum,
             y= n,
             fill = Gebiet
           ))+
      geom_bar(stat = "identity", position = "dodge")