Search code examples
rdatetimemergeweek-number

Aggregate week and date in R by some specific rules


I'm not used to using R. I already asked a question on stack overflow and got a great answer. I'm sorry to post a similar question, but I tried many times and got the output that I didn't expect. This time, I want to do slightly different from my previous question. Merge two data with respect to date and week using R I have two data. One has a year_month_week column and the other has a date column.

df1<-data.frame(id=c(1,1,1,2,2,2,2),
               year_month_week=c(2022051,2022052,2022053,2022041,2022042,2022043,2022044),
               points=c(65,58,47,21,25,27,43))

df2<-data.frame(id=c(1,1,1,2,2,2),
                date=c(20220503,20220506,20220512,20220401,20220408,20220409),
                temperature=c(36.1,36.3,36.6,34.3,34.9,35.3))

For df1, 2022051 means 1st week of May,2022. Likewise, 2022052 means 2nd week of May,2022. For df2,20220503 means May 3rd, 2022. What I want to do now is merge df1 and df2 with respect to year_month_week. In this case, 20220503 and 20220506 are 1st week of May,2022.If more than one date are in year_month_week, I will just include the first of them. Now, here's the different part. Even if there is no date inside year_month_week,just leave it NA. So my expected output has a same number of rows as df1 which includes the column year_month_week.So my expected output is as follows:

df<-data.frame(id=c(1,1,1,2,2,2,2),
               year_month_week=c(2022051,2022052,2022053,2022041,2022042,2022043,2022044),
               points=c(65,58,47,21,25,27,43),
               temperature=c(36.1,36.6,NA,34.3,34.9,NA,NA))

Solution

  • First we can convert the dates in df2 into year-month-date format, then join the two tables:

    library(dplyr);library(lubridate)
    df2$dt = ymd(df2$date)
    df2$wk = day(df2$dt) %/% 7 + 1
    df2$year_month_week = as.numeric(paste0(format(df2$dt, "%Y%m"), df2$wk))
    
    df1 %>%
      left_join(df2 %>% group_by(year_month_week) %>% slice(1) %>%
                  select(year_month_week, temperature))
    

    Result

    Joining, by = "year_month_week"
      id year_month_week points temperature
    1  1         2022051     65        36.1
    2  1         2022052     58        36.6
    3  1         2022053     47          NA
    4  2         2022041     21        34.3
    5  2         2022042     25        34.9
    6  2         2022043     27          NA
    7  2         2022044     43          NA