I'm not used to using R. I already asked a question on stack overflow and got a great answer. I'm sorry to post a similar question, but I tried many times and got the output that I didn't expect. This time, I want to do slightly different from my previous question. Merge two data with respect to date and week using R I have two data. One has a year_month_week column and the other has a date column.
df1<-data.frame(id=c(1,1,1,2,2,2,2),
year_month_week=c(2022051,2022052,2022053,2022041,2022042,2022043,2022044),
points=c(65,58,47,21,25,27,43))
df2<-data.frame(id=c(1,1,1,2,2,2),
date=c(20220503,20220506,20220512,20220401,20220408,20220409),
temperature=c(36.1,36.3,36.6,34.3,34.9,35.3))
For df1, 2022051 means 1st week of May,2022. Likewise, 2022052 means 2nd week of May,2022. For df2,20220503 means May 3rd, 2022. What I want to do now is merge df1 and df2 with respect to year_month_week. In this case, 20220503 and 20220506 are 1st week of May,2022.If more than one date are in year_month_week, I will just include the first of them. Now, here's the different part. Even if there is no date
inside year_month_week
,just leave it NA
. So my expected output has a same number of rows as df1
which includes the column year_month_week
.So my expected output is as follows:
df<-data.frame(id=c(1,1,1,2,2,2,2),
year_month_week=c(2022051,2022052,2022053,2022041,2022042,2022043,2022044),
points=c(65,58,47,21,25,27,43),
temperature=c(36.1,36.6,NA,34.3,34.9,NA,NA))
First we can convert the dates in df2
into year-month-date format, then join the two tables:
library(dplyr);library(lubridate)
df2$dt = ymd(df2$date)
df2$wk = day(df2$dt) %/% 7 + 1
df2$year_month_week = as.numeric(paste0(format(df2$dt, "%Y%m"), df2$wk))
df1 %>%
left_join(df2 %>% group_by(year_month_week) %>% slice(1) %>%
select(year_month_week, temperature))
Result
Joining, by = "year_month_week"
id year_month_week points temperature
1 1 2022051 65 36.1
2 1 2022052 58 36.6
3 1 2022053 47 NA
4 2 2022041 21 34.3
5 2 2022042 25 34.9
6 2 2022043 27 NA
7 2 2022044 43 NA