Search code examples
rmatchsubsetcol

Select non NA values in match when matching columns


I have two data.frames, aa and bb:

    aa <- data.frame(ID = c(rep(letters[1:4]),"d","d", "e","e"), Name = c("Dave", "Charlie", "Ella", NA,"Timothy","James",NA, "Bob"))
aa
  ID    Name
1  a    Dave
2  b Charlie
3  c    Ella
4  d    <NA>
5  d Timothy
6  d   James
7  e    <NA>
8  e     Bob

bb <- data.frame(ID = rep(letters[1:5]),Name = 0)

 bb
  ID Name
1  a    0
2  b    0
3  c    0
4  d    0

I want to fill bb$Name by matching bb$ID with aa$ID so that I get the earliest name in the alphabet (in this case James):

 ID    Name
1  a    Dave
2  b Charlie
3  c    Ella
4  d   James
5  e    Bob

However, if I try bb$Name <- aa[match(bb$ID,aa$ID),2] I get:

bb

     ID    Name
1  a    Dave
2  b Charlie
3  c    Ella
4  d    <NA>
5  e    <NA>

Do you think some if statement code along the lines of: if there are more than 1 of the same values in aa$ID, and if one of those values are NA, then select the one that is of earlier alphabetical order to match with. So the d value in bb$Name would be James

I understand I could subset the original data but I don't want to do that in my actual data as the information that aa$Name has an NA value is important


Solution

  • Is it ok to create a temporary copy of aa?

    tmp <- aa[!is.na(aa$Name),]
    tmp <- tmp[order(tmp$Name),]
    bb$Name <- tmp[match(bb$ID,tmp$ID),2]
    bb
    

    This gets you the result you wanted.