Search code examples
rggplot2tidyverse

R data summary and graphs


 ID <- c(1,2,3,4,5)
 DrugA <- c(1,1,0,0,0)
 DrugB <- c(0,0,1,0,1)
 TreatmentLevel <- c("Inpatient","Outpatient","Outpatient","Inpatient","Outpatient")
 df <- data.frame(ID,DrugA,DrugB,TreatmentLevel)

  df %>%
    pivot_longer(starts_with("Drug"), names_to = "Var") %>%
    group_by(TreatmentLevel, Var) %>%
    summarise(value = sum(!is.na(value))) %>%
    pivot_wider(names_from = TreatmentLevel, values_from = value)
# A tibble: 2 x 3
  Var   Inpatient Outpatient
  <chr>     <int>      <int>
1 DrugA         2          3
2 DrugB         2          3

I want to summarize the data into the table below. But I am getting wrong tables. Also can you help me to draw a graph with ggplot2?

enter image description here enter image description here


Solution

  • The issue is that you also count the 0 values. To get your desired result you could do:

    library(dplyr)
    library(tidyr)
    library(ggplot2)
    
    tbl <- df %>%
      pivot_longer(starts_with("Drug"), names_to = "Var") %>%
      count(TreatmentLevel, Var, wt = value)
    
    tbl
    #> # A tibble: 4 × 3
    #>   TreatmentLevel Var       n
    #>   <chr>          <chr> <dbl>
    #> 1 Inpatient      DrugA     1
    #> 2 Inpatient      DrugB     0
    #> 3 Outpatient     DrugA     1
    #> 4 Outpatient     DrugB     2
    
    ggplot(tbl, aes(Var, n, fill = TreatmentLevel)) +
      geom_col(position = "dodge2")