Search code examples
rggplot2plotgraph

How to merge two dataset and plot them in R


I have this dataset which is COVID dataset per county and state. I also have this state population dataset as well. I probably need to somehow merge the two dataset together and plot the number of cases and death per capita. How can I plot the number of cases and death per capita for each state?

I have the following code for the merging but it repeats the state over and over and doesn't work.

{r}
#to calculate average of cases and deaths in states. 
covid %>% group_by(state) %>% summarise(ave_cases= ave(cases, na.rm = TRUE), ave_deaths= ave(deaths, na.rm = TRUE))

{r}
#to merge two data frames to have access to the population of each state.
covid<- rownames_to_column (covid, var="state")
covid_new <- covid %>% 
  left_join(US_state_pop_2020_estimate , by = c("state_territory" = "state")) %>% 
  tibble()
covid_new

Solution

  • Perhaps something like this?

    library(dplyr)
    #> 
    #> Attaching package: 'dplyr'
    #> The following objects are masked from 'package:stats':
    #> 
    #>     filter, lag
    #> The following objects are masked from 'package:base':
    #> 
    #>     intersect, setdiff, setequal, union
    library(ggplot2)
    options(scipen = 999999)
    
    dat1 <- read.delim("~/Downloads/US_state_pop_2020_estimate.txt", sep = "\t")
    dat2 <- read.csv("~/Downloads/us-counties.csv")
    
    dat1_renamed <- rename(dat1, "state" = "state_territory")
    covid_new <- left_join(dat1_renamed, dat2, by = "state")
    
    covid_new %>%
      group_by(state) %>%
      summarise(number_of_cases = sum(cases),
                deaths_per_capita = sum(deaths / population)) %>%
      ggplot(aes(x = state, y = deaths_per_capita, fill = number_of_cases)) +
      geom_col() +
      theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1),
            axis.title.x = element_blank()) +
      scale_fill_viridis_c()
    

    Created on 2021-10-06 by the reprex package (v2.0.1)