Search code examples
rggplot2dplyrcumsum

Plotting in ggplot using cumsum


I am trying to use ggplot2 to plot a date column vs. a numeric column.

I have a dataframe that I am trying to manipulate with country as either china or not china, and successfully created the dataframe linked below with:

is_china <- confirmed_cases_worldwide %>%
  filter(country == "China", type=='confirmed') %>%
  group_by(country) %>%
  mutate(cumu_cases = cumsum(cases)) 

is_not_china <- confirmed_cases_worldwide %>%
  filter(country != "China", type=='confirmed') %>%
  mutate(cumu_cases = cumsum(cases))

is_not_china$country <- "Not China"

china_vs_world <- rbind(is_china,is_not_china)

Now essentially I am trying to plot a line graph with cumu_cases and date between "china" and "not china" I am trying to execute this code:

plt_china_vs_world <- ggplot(china_vs_world) +
  geom_line(aes(x=date,y=cumu_cases,group=country,color=country)) +
  ylab("Cumulative confirmed cases") 

Now I keep getting a graph looking like this: enter image description here

Don't understand why this is happening, been trying to convert data types and other methods. Any help is appreciated, I linked both csv below

https://github.com/king-sules/Covid


Solution

  • The 'date' for other 'country' are repeated because the 'country' is now changed to 'Not China'. It would be either changed in the OP's 'is_not_china' step or do this in 'china_vs_world'

    library(ggplot2)
    library(dplyr)
    china_vs_world %>%
       group_by(country, date) %>%
       summarise(cumu_cases = sum(cases)) %>% 
       ungroup %>% 
       mutate(cumu_cases = cumsum(cumu_cases)) %>%
       ggplot() +  
        geom_line(aes(x=date,y=cumu_cases,group=country,color=country)) + 
           ylab("Cumulative confirmed cases") 
    

    -output

    enter image description here

    NOTE: It is the scale that shows the China numbers to be small.

    As @Edward mentioned a log scale would make it more easier to understand

    china_vs_world %>%
       group_by(country, date) %>%
       summarise(cumu_cases = sum(cases)) %>% 
       ungroup %>% 
       mutate(cumu_cases = cumsum(cumu_cases)) %>%
       ggplot() +  
        geom_line(aes(x=date,y=cumu_cases,group=country,color=country)) + 
           ylab("Cumulative confirmed cases") +     
        scale_y_continuous(trans='log')
    

    enter image description here

    Or with a facet_wrap

    china_vs_world %>% 
       group_by(country, date) %>%
       summarise(cumu_cases = sum(cases)) %>% 
       ungroup %>%
       mutate(cumu_cases = cumsum(cumu_cases)) %>%      
      ggplot() +  
        geom_line(aes(x=date,y=cumu_cases,group=country,color=country)) + 
          ylab("Cumulative confirmed cases") +
        facet_wrap(~ country, scales = 'free_y')
    

    enter image description here

    data

    china_vs_world <- read.csv("https://raw.githubusercontent.com/king-sules/Covid/master/china_vs_world.csv", stringsAsFactors = FALSE)
    china_vs_world$date <- as.Date(china_vs_world$date)