I have 3 columns of data that I'd like to use to populate column D (example below of end result, currently column D is blank). For each row, columns A-C will have 1 populated value and 2 NA values. How can I do this?
A B C D
1 'a' NA NA 'a'
2 NA 'b' NA 'b'
3 NA 'b' NA 'b'
4 NA NA 'c' 'c'
5 NA NA 'c' 'c'
6 'a' NA NA 'a'
7 'a' NA NA 'a'
8 NA NA 'c' 'c'
I tried the following function and although it didn't error out, it didn't populate my dataset. I'm wondering what I'm doing wrong.
Thanks for your help
pop_D <- function(dataset){
for(i in 1:nrow(dataset)){
if(!is.na(dataset[i,'A'])){
dataset[i,'D'] <- dataset[i,'A']
}else if(!is.na(dataset[i,'B'])){
dataset[i,'D'] <- dataset[i,'B']
}else{
dataset[i,'D'] <- dataset[i,'C']
}
}
}
pop_D(ds)
Considering dft
as your input, you can use dplyr
and do:
dft %>%
mutate(D = coalesce(A,B,C))
which gives:
A B C D
1 a <NA> <NA> a
2 <NA> b <NA> b
3 <NA> b <NA> b
4 <NA> <NA> c c
5 <NA> <NA> c c
6 a <NA> <NA> a
7 a <NA> <NA> a
8 <NA> <NA> c c
p.s. I prepared the sample input data by copying from the question as:
dft <- read.table(header = TRUE, text = "id A B C D
1 'a' NA NA 'a'
2 NA 'b' NA 'b'
3 NA 'b' NA 'b'
4 NA NA 'c' 'c'
5 NA NA 'c' 'c'
6 'a' NA NA 'a'
7 'a' NA NA 'a'
8 NA NA 'c' 'c'",stringsAsFactors=FALSE)
dft$id<- NULL
dft$D <- NULL