I'm looking at some code:
df1 <- inner_join(metadata, otu_counts, by="sample_id") %>%
inner_join(., taxonomy, by="otu") %>%
group_by(sample_id) %>%
mutate(rel_abund = count / sum(count)) %>%
ungroup() %>%
select(-count)
This first chunk I completely understand but I'm new and I can only assume that this second chunk's '.group = "drop"' does the same thing as the previous ungroup().
If so, then does it have to do with the last function being a summarize() function?
df2 <- df1 %>%
filter(level=="phylum") %>%
group_by(disease_stat, sample_id, taxon) %>%
summarize(rel_abund = sum(rel_abund), .groups="drop") %>% #
group_by(disease_stat, taxon) %>%
summarize(mean_rel_abund = 100*mean(rel_abund), .groups="drop")
Can someone explain?
UPDATE: I realize that the first .groups = "drop" eliminates a newly created variable which was sample_id. Is there more to this?
This is a special behavior/capability of summarize
. When you group data by multiple variables, summarize
defaults to keeping the first grouping in the output data frame.
library(wec)
library(dplyr)
data(PUMS)
PUMS %>%
group_by(race, education.cat) %>%
summarise(hi = mean(wage))
# # A tibble: 8 × 3
# # Groups: race [4]
# race education.cat hi
# <fct> <fct> <dbl>
# 1 Hispanic High school 35149.
# 2 Hispanic Degree 52344.
# 3 Black High school 30552.
# 4 Black Degree 48243.
# 5 Asian High school 35350
# 6 Asian Degree 78213.
# 7 White High school 38532.
# 8 White Degree 69135.
Notice that the above data frame still has 4 groups. If you use the .groups = "drop"
argument in summarize
, the output numbers are identical but the data frame has no groups.
PUMS %>%
group_by(race, education.cat) %>%
summarise(hi = mean(wage), .groups = "drop")
# # A tibble: 8 × 3
# race education.cat hi
# <fct> <fct> <dbl>
# 1 Hispanic High school 35149.
# 2 Hispanic Degree 52344.
# 3 Black High school 30552.
# 4 Black Degree 48243.
# 5 Asian High school 35350
# 6 Asian Degree 78213.
# 7 White High school 38532.
# 8 White Degree 69135.
The mutate
function in the first of your examples does not have a built in .groups
functionality, so you have to take an extra line to ungroup()
if you wish to do so afterwards.