Search code examples
rdplyrtibble

How can i add missing month value and remove duplicate in dplyr in R?


I have a dataset that looks like this :

vaR date
A 1/1/2022
A 1/2/2022
A 1/3/2022
B 1/1/2022
B 1/3/2022
C 1/1/2022
C 1/1/2022
C 1/2/2022
C 1/2/2022
C 1/3/2022

And i want to be arranged by month and by the var variable. But if a month is not recorded (missing) i want to be added (to be appeared ) in the new column named Month and to mutate (dplyr phrase) another column that will check if there was an entry on that month (logical condition).But there are some entries for example C that has more that one entries which counts for one (distinct).

Ideally is must look like this :

var Quarter Month Condition
A 1 1 TRUE
A 1 2 TRUE
A 1 3 TRUE
B 1 1 TRUE
B 1 2 FALSE
B 1 3 TRUE
C 1 1 TRUE
C 1 2 TRUE
C 1 3 TRUE

As a start i have tried this one in R :

var = c(rep("A",3),rep("B",2),rep("C",5));var
date = c(as.Date("2022/01/01"),as.Date("2022/02/01"),as.Date("2022/03/01"),
         as.Date("2022/01/01"),as.Date("2022/03/01"),
         as.Date("2022/01/01"),as.Date("2022/01/01"),as.Date("2022/02/01"),as.Date("2022/02/01"),as.Date("2022/03/01"))
data = tibble(var,date)
quarter = 1
data%>%
  dplyr::mutate(month = lubridate::month(date),
                Quarter = quarter)

But i don't know how to add the missing month and check for the verified condition. Any help ?


Solution

  • You can use complete() to fill in the missing months and then check whether they have an associated date, then use distinct() to find the unique combinations.

    library(dplyr)
    library(tidyr)
    
    var = c(rep("A",3),rep("B",2),rep("C",5))
    date = c(as.Date("2022/01/01"),as.Date("2022/02/01"),as.Date("2022/03/01"),
             as.Date("2022/01/01"),as.Date("2022/03/01"),
             as.Date("2022/01/01"),as.Date("2022/01/01"),as.Date("2022/02/01"),as.Date("2022/02/01"),as.Date("2022/03/01"))
    data = tibble(var,date)
    quarter = 1
    data %>% 
      mutate(month = lubridate::month(date)) %>% 
      complete(var, month) %>% 
      mutate(Quarter = quarter,
             Condition = !is.na(date)) %>% 
      distinct(var, month, Quarter, Condition)
    #> # A tibble: 9 × 4
    #>   var   month Quarter Condition
    #>   <chr> <dbl>   <dbl> <lgl>    
    #> 1 A         1       1 TRUE     
    #> 2 A         2       1 TRUE     
    #> 3 A         3       1 TRUE     
    #> 4 B         1       1 TRUE     
    #> 5 B         2       1 FALSE    
    #> 6 B         3       1 TRUE     
    #> 7 C         1       1 TRUE     
    #> 8 C         2       1 TRUE     
    #> 9 C         3       1 TRUE
    

    Created on 2022-06-01 by the reprex package (v2.0.1)