Search code examples
rdataframenacoalesce

Use for loop with nested if statement in R to populate column from non-NA values in dataframe


I have 3 columns of data that I'd like to use to populate column D (example below of end result, currently column D is blank). For each row, columns A-C will have 1 populated value and 2 NA values. How can I do this?

    A   B   C   D
1  'a' NA  NA  'a'
2  NA  'b' NA  'b'
3  NA  'b' NA  'b'
4  NA  NA 'c'  'c'
5  NA  NA 'c'  'c'
6  'a' NA  NA  'a'
7  'a' NA  NA  'a'
8  NA  NA 'c'  'c'

I tried the following function and although it didn't error out, it didn't populate my dataset. I'm wondering what I'm doing wrong.

Thanks for your help

pop_D <- function(dataset){
for(i in 1:nrow(dataset)){
    if(!is.na(dataset[i,'A'])){
    dataset[i,'D'] <- dataset[i,'A']
  }else if(!is.na(dataset[i,'B'])){
    dataset[i,'D'] <- dataset[i,'B']
  }else{
    dataset[i,'D'] <- dataset[i,'C']
  }
 }
}
pop_D(ds)

Solution

  • Considering dft as your input, you can use dplyr and do:

    dft %>%
      mutate(D = coalesce(A,B,C))
    

    which gives:

         A    B    C D
    1    a <NA> <NA> a
    2 <NA>    b <NA> b
    3 <NA>    b <NA> b
    4 <NA> <NA>    c c
    5 <NA> <NA>    c c
    6    a <NA> <NA> a
    7    a <NA> <NA> a
    8 <NA> <NA>    c c
    

    p.s. I prepared the sample input data by copying from the question as:

    dft <- read.table(header = TRUE, text = "id    A   B   C   D
    1  'a' NA  NA  'a'
    2  NA  'b' NA  'b'
    3  NA  'b' NA  'b'
    4  NA  NA 'c'  'c'
    5  NA  NA 'c'  'c'
    6  'a' NA  NA  'a'
    7  'a' NA  NA  'a'
    8  NA  NA 'c'  'c'",stringsAsFactors=FALSE)
    
    dft$id<- NULL
    dft$D <- NULL