Search code examples
rtidyrdata-manipulation

Replace NA based on similar data in differend rows


I have a dataframe as follows: (first three columns, fourth shows what I want)

table

Type <- rep(c("fruit", "vegetable"), each=4)
Kind <- c("Reference", "apple", "apple", "banana", "Reference", "tomato", "tomato", "mushroom")
Expected <- c(3,2,NA, NA, 5,4,NA, NA)
wanted <- c(3,2,2,3, 5,4,4,5)

mydat <- data.frame(Type, Kind, Expected, wanted)

I need the column "Expected" to be filled based on the data in other rows. If a row as NA in Expected, I want the value from Expected to be taken from the closest similar row regarding Type and Kind (example NA with an apple: fruit -> apple -> take the value 2 from the expected mass for all apple NAs). If there is none available, as for banana, I want to take the value it can find in the closest Reference, in the banana example this would be fruit -> Reference. The column wanted shows the values I would like to receive. Is this possible?

Thanks for your help!


Solution

  • We can do this in 2 stages. First stage we fill by Type and Kind, second stage we fill again by Type, lumping together the "Reference" Kind with whatever still needs filling:

    mydat %>%
      group_by(Type, Kind) %>%
      fill(Expected, .direction = "downup") %>%
      group_by(Type, temp_kind = Kind == "Reference" | is.na(Expected)) %>% 
      fill(Expected, .direction = "downup") %>%
      ungroup() %>%
      select(-temp_kind)
    # # A tibble: 8 × 4
    #   Type      Kind      Expected wanted
    #   <chr>     <chr>        <dbl>  <dbl>
    # 1 fruit     Reference        3      3
    # 2 fruit     apple            2      2
    # 3 fruit     apple            2      2
    # 4 fruit     banana           3      3
    # 5 vegetable Reference        5      5
    # 6 vegetable tomato           4      4
    # 7 vegetable tomato           4      4
    # 8 vegetable mushroom         5      5