Search code examples
rdplyrtidyrdata-manipulation

adding factor levels with empty values to the df


I have this df:

  Value Quantity Percentage 
1 One         18      0.409     
2 Three        2      0.045     
3 Five        24      0.545     
4 Total       44      0.999     

And the value column has six factor levels:

> levels(df$Value)
[1] "One" "Two" "Three" "Four" "Five"            
[6] "Total"    

I'm trying, after creating the df above, to add the factor/factors that doesn't have value in the df as I need to plot this table and show which Value has Quantity == 0. Like:

  Value Quantity Percentage 
  One         18      0.409     
  Two          0      0
  Three        2      0.045    
  Four         0      0
  Five        24      0.545     
  Total       44      0.999   

However, in order to avoid solutions to solve specifically for just Two and Four in this example, there also could happen that:

  • All the factors have a Quantity value > 0, or maybe just two factors values have a Quantity > 0. Said this, I'm trying to find a solution where it checks which factor is not in the df (because it has Quantity == 0, and if the factor has Quantity == 0, add to the df as in the desired output above.

Solution

  • Here is a possible solution:

    #fill the first column with all of the levels
    df$Value<-factor(df$Value, levels=c("One", "Two", "Three", "Four", "Five", "Total" ))
    
    #complete and fill the table
    library(tidyr)
    complete(df, Value, fill=list(Quantity = 0, Percentage =0))
    
    
    df
     # A tibble: 6 x 3
      Value Quantity Percentage
      <fct>    <dbl>      <dbl>
    1 One         18      0.409
    2 Two          0      0    
    3 Three        2      0.045
    4 Four         0      0    
    5 Five        24      0.545
    6 Total       44      0.999