Search code examples
rtidyversefill

filling missing values using tidyverse syntax based on conditions within the tibble


I'm having difficulty filling missing values in my tibble based on some conditions.

I'm trying to fill NA values in the manufacturer column based on the condition field either being YES or NO, I'm only interested in filling NA values for groups where the condition column contains YES.

I'm getting the following error which I can't figure out:

no applicable method for 'fill' applied to an object of class "character"

Here's a simple reprex which hopefully provides some context to the problem I'm trying to solve.

df <- tibble(
  code = c("A", "A", "A", "A", "B", "B", "B", "B", "B"),
  cost = c(5000, 4000, 3000, 2000, 40000, 30000, 20000, 10000, 5000),
  manufacturer = c("ManA", NA, NA, NA, "ManB", "ManB", NA, NA, "ManB"),
  condition = c("NO", "NO", "NO", "NO", "YES", "YES", "YES", "YES", "YES")
) %>%
  group_by(code) %>%
  arrange(desc(cost), .by_group = TRUE) %>%
  mutate(manufacturer = if_else(condition == "YES", fill(manufacturer, .direction = "down"), manufacturer))

I've included the mutate in the last line to highlight what I've tried so far to solve for this but is causing the error above.

Any help that can be provided would be greatly appreciated.


Solution

  • You can create a copy of manufacturer column, fill everything and then based on condition assign the original values back.

    library(dplyr)
    library(tidyr)
    
    df %>%
      #Create a copy of the column
      mutate(new_manufacturer = manufacturer) %>%
      group_by(code) %>%
      arrange(desc(cost), .by_group = TRUE) %>%
      # Fill everything
      fill(new_manufacturer, .direction = "down") %>%
      # Keep the new value only if condition is "YES"
      mutate(new_manufacturer = if_else(condition == "YES", 
                     new_manufacturer, manufacturer)) %>%
      ungroup()
    
    #  code   cost manufacturer condition new_manufacturer
    #  <chr> <dbl> <chr>        <chr>     <chr>           
    #1 A      5000 ManA         NO        ManA            
    #2 A      4000 NA           NO        NA              
    #3 A      3000 NA           NO        NA              
    #4 A      2000 NA           NO        NA              
    #5 B     40000 ManB         YES       ManB            
    #6 B     30000 ManB         YES       ManB            
    #7 B     20000 NA           YES       ManB            
    #8 B     10000 NA           YES       ManB            
    #9 B      5000 ManB         YES       ManB            
    

    new_manufacturer column is the output that you are looking for.