Search code examples
rexcelmerge

How can I merge one excel datasheet with a master excel sheet with only some columns matching?


I'll preface this by saying I'm still pretty new to R and programming in general but I'm taking baby steps and this assignment is one of my more challenging ones.

I have two excel spreadsheets and I'm trying to combine one of them (the working sheet) into the master sheet. The matching columns only include the ID and dates. There is an extra complication which is that the master sheet may not include all the recently added IDs on the working sheet but I still want the new IDs and related information added as new rows in the master sheet.

How could I best do this in R?

Hopefully that explanation made some sense and thank you for helping!

So far, I have only tried using purrr:map_dfr following the advice of others but it seems to only work (at least when I tried it) for sheets with matching columns.

Here are example tables!

Working sheet with #78 acting as a participant that was not added/is missing from the master sheet

ID Date Wave Notes Data_1 Data_2 Data_3
58 1/18/2014 Y 47.5 384.3 394.3
144 1/18/2014 Y2 65.3 38.3 90.4
58 4/18/2014 Y2 499.2 293.2 22.3
78 1/19/2018 Y5 24.4 22.1 12.1

Master sheet

ID Date Wave Data_4 Data_5 Data_6 Data_7 Diagnosis Age Status Gender
58 1/18/2014 Y1 39.4 3959.3 474.3 283.2 Negative 45 ... M
144 1/18/2014 Y2 484.3 4849.2 383.3 393.3 Positive 80 ... F
58 4/18/2015 Y2 501.6 394.2 2394.2 290.3 Positive 46 ... M

Ideally, from the working sheet, I would want to insert Data_1, Data_2, and Data_3 in specific places in the master sheet such as between wave and Data_4.


Solution

  • library(tidyverse)
    
    otherSheet <- data.frame(ID = c(58, 144, 58, 78), 
                              Date = c('1/18/2014', '1/18/2014', 
                                       '4/18/2014', '1/19/2018'), 
                              Wave = c('Y', 'Y2', 'Y2', 'Y5'), 
                              Notes = rep('',4 ), 
                              Data_1 = c(47.5, 65.3, 499.2, 24.4), 
                              Data_2 = c(384.2, 38.2, 293.2, 22.1), 
                              Data_3 = c(394.3, 90.4, 22.3, 12.1))
    
    masterSheet <- data.frame(ID = c(58, 144, 58), 
                              Date = c('1/18/2014', '1/18/2014', 
                                       '4/18/2014'), 
                              Wave = c('Y', 'Y2', 'Y2'), 
                              Data_4 = c(39.4, 484.3, 501.6), 
                              Data_5 = c(3959.3, 4849.2, 394.2), 
                              Data_6 = c(474.3, 383.3, 2394.2), 
                              Diagnosis = c('Negative', 'Postive', 'Positive'), 
                              Age = c(45, 80, 46), 
                              Status = rep('...', 3), 
                              Gender = c('M', 'F', 'M'))
    
    combinedSheets <- masterSheet %>% 
     full_join(otherSheet)
    

    sample