Cleaning dataset based on a reference file

I am trying to clean data_set by replacing some values for some variables with the new values which is included in the Changes_file after matching the ID and the variable name.

The ID is unique in the data_set but it could be duplicated in the changes_file

Example:

> data_set = data.frame(ID = c('20','22','25','60','40','44'), Gender = c('Female','Female','Female','Male','Male','Male'), Age = c('41','45','50','52','36','61'))

> data_set
  ID Gender Age
1 20 Female  41
2 22 Female  45
3 25 Female  50
4 60   Male  52
5 40   Male  36
6 44   Male  61

> Changes_file = data.frame(ID = c('22','22','60','40'), Variables = c('Gender','Age','Age','Gender'), New_Information  = c('Male','50','17','Female'))
> Changes_file
  ID Variables New_Information
1 22    Gender            Male
2 22       Age              50
3 60       Age              17
4 40    Gender          Female

The result is expected to be:

> data_set_new
  ID Gender Age
1 20 Female  41
2 22   Male  50
3 25 Female  50
4 60   Male  17
5 40 Female  36
6 44   Male  61

The data I am working on has more than 10,000 rows and more than 300 variables, and I just mentioned a simple example above.

I appreciate any advice or suggestions for this process and especially for the 'matching' part.

Thank you,

Solution

We can do this by pivoting your changes data so it's the same structure as the original data and then using a couple rows_* operations:

library(tidyr)
library(dplyr)

changes_wide = Changes_file |>
  pivot_wider(names_from = Variables, values_from = New_Information)

common_cols = intersect(names(changes_wide), names(data_set))

changes_wide |>
  ## "patch" the changes file first so the NAs don't overwrite 
  ## non-missing values
  rows_patch(y = data_set[common_cols], by = "ID", unmatched = "ignore") |>
  ## then update the original with the patched changes
  rows_update(x = data_set, by = "ID")
#   ID Gender Age
# 1 20 Female  41
# 2 22   Male  50
# 3 25 Female  50
# 4 60   Male  17
# 5 40 Female  36
# 6 44   Male  61