Search code examples
rdplyrexpandlevels

Making explicit implicit missing values in nested levels


I am trying to complete my dataframe with missing levels.

Current output

id foo bar val
1   a   x   7
2   a   y   9
3   a   z   6
4   b   x  10
5   b   y   4
6   b   z   5
7   c   y   2

Data

structure(list(id = c("1", "2", "3", "4", "5", "6", "7"), foo = c("a", 
"a", "a", "b", "b", "b", "c"), bar = c("x", "y", "z", "x", "y", 
"z", "y"), val = c("7", "9", "6", "10", "4", "5", "2")), .Names = c("id", 
"foo", "bar", "val"), row.names = c(NA, -7L), class = "data.frame")

I would like to make explicit the missing nested levels of c with 0s for x and z. I could find a workaround with expand.grid but could not manage to obtain the desired output with tidyr.

Desired output :

id foo bar val
1   a   x   7
2   a   y   9
3   a   z   6
4   b   x   10
5   b   y   4
6   b   z   5
7   c   x   0
8   c   y   2
9   c   z   0

Thanks in advance!


Solution

  • Given that you are looking for a tidyr solution, you should check out tidyr::complete (which does exactly what you are after):

    library(tidyverse)
    
    complete(df, foo, bar, fill = list(val = 0)) %>% select(-id)
    #> # A tibble: 9 x 3
    #>   foo   bar   val  
    #>   <chr> <chr> <chr>
    #> 1 a     x     7    
    #> 2 a     y     9    
    #> 3 a     z     6    
    #> 4 b     x     10   
    #> 5 b     y     4    
    #> 6 b     z     5    
    #> 7 c     x     0    
    #> 8 c     y     2    
    #> 9 c     z     0