Consider this "melted" tibble:
structure(list(country = c("Canada", "US", "Canada"), gender = c("male", "male", "female"), count = c(1, 5, 3)), row.names = c(NA, -3L ), class = c("tbl_df", "tbl", "data.frame"))
country | gender | count |
---|---|---|
Canada | male | 1 |
US | male | 5 |
Canada | female | 3 |
So Canada has a count for males and a count for females, while the US only has a count for males. This implicitly means that US x female count is zero. Since it's 2 countries times 2 genders, the number of rows should be 4, but the row with zero count is omitted. Is there a way to add a row US-female-0 to "complete" this tibble?
A simple way to do it is to reshape to wide and then to long again:
df %>%
pivot_wider(names_from = gender, values_from = count) %>%
mutate(female = ifelse(is.na(female), 0, female)) %>%
reshape2::melt("country") %>%
tibble
But this is long to write, it may be slow, and is complex if the widened data is bigger than this.
Another possibility, which seems smarter, is:
cross_join(df %>% distinct(country), df %>% distinct(gender)) %>%
left_join(df) %>%
mutate(count = ifelse(is.na(count), 0, count))
But this is not easy to insert in the middle of a pipe, for example in case df
itself comes from a pipe, without breaking into multiple operations or using global assignment.
Is there a simple way to do this "completion" of the data that is reasonably fast, scalable, and works inside a pipe?
tidyr::complete(df, country, gender, fill = list(count = 0))
# A tibble: 4 × 3
country gender count
<chr> <chr> <dbl>
1 Canada female 3
2 Canada male 1
3 US female 0
4 US male 5