Search code examples
rjoindplyrtibble

Add rows to complete melted tibble


Consider this "melted" tibble:

structure(list(country = c("Canada", "US", "Canada"), gender = c("male", "male", "female"), count = c(1, 5, 3)), row.names = c(NA, -3L ), class = c("tbl_df", "tbl", "data.frame"))

country gender count
Canada male 1
US male 5
Canada female 3

So Canada has a count for males and a count for females, while the US only has a count for males. This implicitly means that US x female count is zero. Since it's 2 countries times 2 genders, the number of rows should be 4, but the row with zero count is omitted. Is there a way to add a row US-female-0 to "complete" this tibble?

A simple way to do it is to reshape to wide and then to long again:

df %>%
    pivot_wider(names_from = gender, values_from = count) %>% 
    mutate(female = ifelse(is.na(female), 0, female)) %>% 
    reshape2::melt("country") %>% 
    tibble

But this is long to write, it may be slow, and is complex if the widened data is bigger than this.

Another possibility, which seems smarter, is:

cross_join(df %>% distinct(country), df %>% distinct(gender)) %>%
    left_join(df) %>%
    mutate(count = ifelse(is.na(count), 0, count))

But this is not easy to insert in the middle of a pipe, for example in case df itself comes from a pipe, without breaking into multiple operations or using global assignment.

Is there a simple way to do this "completion" of the data that is reasonably fast, scalable, and works inside a pipe?


Solution

  • tidyr::complete(df, country, gender, fill = list(count = 0))
    
    # A tibble: 4 × 3
      country gender count
      <chr>   <chr>  <dbl>
    1 Canada  female     3
    2 Canada  male       1
    3 US      female     0
    4 US      male       5