Search code examples
rdplyrreplacemissing-datatransfer

r transfer values from one dataset to another by ID


I have two datasets , the first dataset is like this

   ID     Weight     State
   1      12.34      NA
   2      11.23      IA
   2      13.12      IN
   3      12.67      MA 
   4      10.89      NA
   5      14.12      NA

The second dataset is a lookup table for state values by ID

   ID    State
   1     WY
   2     IA
   3     MA
   4     OR
   4     CA
   5     FL

As you can see there are two different state values for ID 4, which is normal.

What I want to do is replace the NAs in dataset1 State column with State values from dataset 2. Expected dataset

  ID     Weight     State
   1      12.34      WY
   2      11.23      IA
   2      13.12      IN
   3      12.67      MA 
   4      10.89      OR,CA
   5      14.12      FL

Since ID 4 has two state values in dataset2 , these two values are collapsed and separated by , and used to replace the NA in dataset1. Any suggestion on accomplishing this is much appreciated. Thanks in advance.


Solution

  • Collapse df2 value and join it with df1 by 'ID'. Use coalesce to use non-NA value from the two state columns.

    library(dplyr)
    
    df1 %>%
      left_join(df2 %>%
                  group_by(ID) %>%
                  summarise(State = toString(State)), by = 'ID') %>%
      mutate(State = coalesce(State.x, State.y)) %>%
      select(-State.x, -State.y)
    
    #  ID Weight  State
    #1  1   12.3     WY
    #2  2   11.2     IA
    #3  2   13.1     IN
    #4  3   12.7     MA
    #5  4   10.9 OR, CA
    #6  5   14.1     FL
    

    In base R with merge and transform.

    merge(df1, aggregate(State~ID, df2, toString), by = 'ID') |>
      transform(State = ifelse(is.na(State.x), State.y, State.x))