Search code examples
rdataframesubsetna

How to create a variable to a dataset conditioning on missing values and another dataframe at the same time?


I have these two dataframes (imagine them very big) :

df = data.frame(subjects = 1:10,
                var1 = c('a',NA,'b',NA,'c',NA,'d','e','f','g'))

g = data.frame(subjects = c(1,3,5,7,8,9,10),
               score = c(1,2,1,3,2,4,1) )

and I want to put the variable score from the g dataframe into the df dataframe, with the condition that if var1 = NA, then the score in df will be equal to NA. How can we make that with a simple function ? thanks.

Second scenario :

df = data.frame(subjects = 1:10,
                var1 = c('a','e','b','c','c','b','d','e','f','g'))

g = data.frame(subjects = c(1,3,5,7,8,9,10),
               score = c(1,2,1,3,2,4,1) )

now I want that the score for each subject that was not calculated to be NAs to become as follows :

df = data.frame(subjects = 1:10,
                var1 = c('a','e','b','c','c','b','d','e','f','g'),
                score = c(1,NA,2,NA,1,NA,3,2,4,1))



Solution

  • We could do a join by 'subjects' which return 'score' with NA where there are no corresponding 'subject's in 'g'. If we need the 'score' to be NA also when 'var1' is NA, do a replace on the next step with NA check on 'var1'

    library(dplyr)
    df <- left_join(df, g, by= "subjects") %>% 
        mutate(score = replace(score, is.na(var1), NA))
    

    -output

    df
    subjects var1 score
    1         1    a     1
    2         2    e    NA
    3         3    b     2
    4         4    c    NA
    5         5    c     1
    6         6    b    NA
    7         7    d     3
    8         8    e     2
    9         9    f     4
    10       10    g     1