Apologies in advance for a probably very basic question from a fresh R-user: I have two different numeric variables for age (age1, age2), they both contain some missing values. I want to create one variable that contains all the values (all the non-NA-values) of the two other variables, but struggle to get this done. As an example, I take these two columns
age1 <- c(NA, 21, 22, NA, 24, 25, NA, NA)
age2 <- c(20, NA, NA, 23, NA, NA, 26, NA)
dt <- data.frame(age1, age2)
I struggle to merge them into one column, trying to get a result like this:
dt$age <- c(20, 21, 22, 23, 24, 25, 26, NA)
I have tried things like:
dt$age[complete.cases(dt$age1)] <- dt$age1
dt$age[complete.cases(dt$age2)] <- dt$age2
or
dt$age[dt$age1>0] <- dt$age1
dt$age[dt$age2>0] <- dt$age2
or
dt$age[na.omit(dt$age1)] <- dt$age1
dt$age[na.omit(dt$age2)] <- dt$age2
But get error messages such as:
I feel there should be some easy way to do this, but struggle to find out of it. Any hints to a solution is greatly appreciated.
Greetings, Bjorn
Using base R
you could do:
dt$age <- ifelse(is.na(dt$age1), dt$age2, dt$age1)
Or, using dplyr
:
coalesce(age1, age2)
This gives us the vector:
[1] 20 21 22 23 24 25 26 NA
If you want to create an age
variable in your dt
dataframe:
dt %>%
mutate(age = coalesce(age1, age2))
Which gives us:
age1 age2 age
1 NA 20 20
2 21 NA 21
3 22 NA 22
4 NA 23 23
5 24 NA 24
6 25 NA 25
7 NA 26 26
8 NA NA NA