Search code examples
rtidyrdata-manipulationreshape

how to use pivot wider if dataframe have any NA value


i have a dataframe like this

df = data.frame(day = c("1", NA, NA, NA, NA, "2", NA, NA, NA),
                Unit = c("unit1", NA, NA, NA, "unit2", "unit1", NA, NA, "unit2"),
                Problem = c("Oil", "Engine", "Electric", NA, NA, "Oil", "Power", NA, NA),
                duration = c(2, 5, 1, NA, NA, 1.5, 3, NA, NA))

row 1:5 is day1 and 6:9 day2, if there are duplicated values in the same column, the value is NA.

i tried to use

df %>% 
  pivot_wider(names_from = Problem, values_from = duration)

but its not worked, my expected df is like this

df1 = data.frame(day = c("1", "1", "2", "2"),
                 Unit = c("unit1", "unit2", "unit1", "unit2"),
                 Oil = c(2, 0, 1.5, 0),
                 Engine = c(5, 0, 0, 0),
                 Electric = c(1, 0, 0, 0),
                 Power = c(0, 0, 3, 0),
                 NoProblem = c(0, 0, 0, 0))

Solution

  • We fill the NA with previous non-NA elements, get the distinct rows, and reshape to wide with pivot_wider

    library(dplyr)
    library(tidyr)
    df %>% 
      fill(day, Unit, Problem) %>%
      distinct(day, Unit, Problem, .keep_all = TRUE) %>% 
      mutate(duration = replace_na(duration, 0)) %>% 
      pivot_wider(names_from = Problem, values_from = duration, 
       values_fill = 0) %>% 
      mutate(NoProblem = 0)
    

    -output

    # A tibble: 4 × 7
      day   Unit    Oil Engine Electric Power NoProblem
      <chr> <chr> <dbl>  <dbl>    <dbl> <dbl>     <dbl>
    1 1     unit1   2        5        1     0         0
    2 1     unit2   0        0        0     0         0
    3 2     unit1   1.5      0        0     3         0
    4 2     unit2   0        0        0     0         0