Search code examples
rdplyrmissing-data

how to replace specific NA in a column with certain string character


This could be very simple to but I could not figure

df<-structure(list(Besti = c("Friend", "myfriend", "yourbest", "allbest"
), Friend = c("Friend", NA, "Friend", "Toofriend"), Val1 = c(0L, 
0L, 0L, 0L), Val2 = c(0L, 0L, 0L, 0L), Val3 = c(0L, 1L, 0L, 0L
), Val4 = c(0L, 0L, 0L, 0L), Val5 = c(0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA, 
-4L))

My data is like this, I want to know how to replace the NA to a string if one higher and one lower string is the same

So I can find that there is an NA

sum(is.na(df$Friend))

If it is one higher friend and one lower is friend, I want to replace it to friend

so the output look like this

df_out<-structure(list(Besti = c("Friend", "myfriend", "yourbest", "allbest"
), Friend = c("Friend", "Friend", "Friend", "Toofriend"), Val1 = c(0L, 
0L, 0L, 0L), Val2 = c(0L, 0L, 0L, 0L), Val3 = c(0L, 1L, 0L, 0L
), Val4 = c(0L, 0L, 0L, 0L), Val5 = c(0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA, 
-4L))

so imagine I have 100 HAs or many and there is no order, maybe one before is NA or one after is NA but the two after is Friend or whatever string

If I want to replace the NA to Friend, I can do this

df$Friend <- df$Friend %>% replace_na('Friend')

Solution

  • library(dplyr)
    df |>
      mutate(
        upper = lag(Friend),
        lower = lead(Friend),
        replacement = ifelse(upper == lower, upper, NA),
        Friend = coalesce(Friend, replacement)
      )
    #>      Besti    Friend Val1 Val2 Val3 Val4 Val5  upper     lower replacement
    #> 1   Friend    Friend    0    0    0    0    0   <NA>      <NA>        <NA>
    #> 2 myfriend    Friend    0    0    1    0    0 Friend    Friend      Friend
    #> 3 yourbest    Friend    0    0    0    0    0   <NA> Toofriend        <NA>
    #> 4  allbest Toofriend    0    0    0    0    0 Friend      <NA>        <NA>
    

    dplyr::lag() and dplyr::lead() shift the vector Friend down/up. We can then test if they have the same value and if they do we use this value as the replacement value. dplyr::coalesce() replaces the NAs in Friend with the replacement value in the same postion. This can be simplified to:

    df |>
      mutate(
        replacement = ifelse(lag(Friend) == tail(Friend), lag(Friend), NA),
        Friend = coalesce(Friend, replacement)
      )
    #>      Besti    Friend Val1 Val2 Val3 Val4 Val5 replacement
    #> 1   Friend    Friend    0    0    0    0    0          NA
    #> 2 myfriend      <NA>    0    0    1    0    0          NA
    #> 3 yourbest    Friend    0    0    0    0    0          NA
    #> 4  allbest Toofriend    0    0    0    0    0          NA