Search code examples
rdata-cleaning

What is the best way to merge two numeric variables with missing values?


Apologies in advance for a probably very basic question from a fresh R-user: I have two different numeric variables for age (age1, age2), they both contain some missing values. I want to create one variable that contains all the values (all the non-NA-values) of the two other variables, but struggle to get this done. As an example, I take these two columns

age1 <- c(NA, 21, 22, NA, 24, 25, NA, NA)
age2 <- c(20, NA, NA, 23, NA, NA, 26, NA)
dt <- data.frame(age1, age2)

I struggle to merge them into one column, trying to get a result like this:

dt$age <- c(20, 21, 22, 23, 24, 25, 26, NA)

I have tried things like:

dt$age[complete.cases(dt$age1)] <- dt$age1
dt$age[complete.cases(dt$age2)] <- dt$age2

or

dt$age[dt$age1>0] <- dt$age1
dt$age[dt$age2>0] <- dt$age2

or

dt$age[na.omit(dt$age1)] <- dt$age1
dt$age[na.omit(dt$age2)] <- dt$age2

But get error messages such as:

  • number of items to replace is not a multiple of replacement length
  • NAs are not allowed in subscripted assignments

I feel there should be some easy way to do this, but struggle to find out of it. Any hints to a solution is greatly appreciated.

Greetings, Bjorn


Solution

  • Using base R you could do:

    dt$age <- ifelse(is.na(dt$age1), dt$age2, dt$age1)
    

    Or, using dplyr:

    coalesce(age1, age2)
    

    This gives us the vector:

    [1] 20 21 22 23 24 25 26 NA
    

    If you want to create an age variable in your dt dataframe:

    dt %>% 
      mutate(age = coalesce(age1, age2))
    

    Which gives us:

      age1 age2 age
    1   NA   20  20
    2   21   NA  21
    3   22   NA  22
    4   NA   23  23
    5   24   NA  24
    6   25   NA  25
    7   NA   26  26
    8   NA   NA  NA