I have a data frame like df:
df <- data.frame(yr = c("2008", "2008", "2008", "2008", "2008", "2009", "2009"),
FM = c(1, 1, 1, 1, 0, 1, 0),
t1 = c(0, 1, 0, 1, 0, 1, 0),
t2 = c(0, 0, 1, 1, 0, 0, 1))
I would like to get 3 different statistisc:
I do not know how to get these statistics in just one table, and perform these sequential calculations in just one code (I need for code clarity and because my dataset is enormous).
The independent steps would be as follows
df %>%
group_by(yr, FM) %>%
summarise(count_FM = n()) %>%
mutate(perc_FM = count_FM/sum(count_FM))
df %>%
filter(FM=="1") %>%
group_by(yr, t1) %>%
summarise(count_t1 = n()) %>%
mutate(perc_t1 = count_t1/sum(count_t1))
df %>%
filter(FM=="1" & t1=="1") %>%
group_by(yr, t2) %>%
summarise(count_t2 = n()) %>%
mutate(perc_t2 = count_t2/sum(count_t2))
Any idea?
The problem is that your summaries are based on different grouping levels, so to get an output suitable for printing you will need to organize your data into a nested table. R's data frames and tibbles do not lend themselves to such a structure, though it is possible with a bit of work and converting all the columns to character columns:
library(tidyverse)
nest(df, data = -(yr:FM)) %>%
mutate(count_FM = unlist(map(data, nrow))) %>%
group_by(yr) %>%
mutate(perc_FM = count_FM / sum(count_FM)) %>%
summarize(FM, count_FM, perc_FM, data, .groups = "keep",
count_t1 = map(data, ~ .x %>%
count(t1, name = "count_t1") %>%
mutate(perc_t1 = count_t1 / sum(count_t1)))) %>%
unnest(count_t1) %>%
summarize(FM, count_FM, perc_FM, t1, count_t1, perc_t1, .groups = "keep",
count_t2 = map(data, ~ .x %>%
count(t2, name = "count_t2") %>%
mutate(perc_t2 = count_t2 / sum(count_t2)))) %>%
unnest(count_t2) %>%
mutate(across(t2:perc_t2, ~ifelse(t1 == 0, "", as.character(.x)))) %>%
group_by(yr, FM, t1) %>%
mutate(across(count_t1:perc_t1, ~c(first(.x), rep("", n() - 1))),
t1 = c(first(t1), rep("", n() - 1))) %>%
group_by(yr, FM) %>%
mutate(across(count_FM:perc_FM, ~c(first(.x), rep("", n() - 1))),
FM = c(first(FM), rep("", n() - 1)),
yr = c(first(yr), rep("", n() - 1))) %>%
as.data.frame()
#> yr FM count_FM perc_FM t1 count_t1 perc_t1 t2 count_t2 perc_t2
#> 1 2008 1 4 0.8 0 2 0.5
#> 2
#> 3 1 2 0.5 0 2 0.5
#> 4 1 2 0.5
#> 5 2008 0 1 0.2 0 1 1
#> 6 2009 1 1 0.5 1 1 1 0 1 1
#> 7 2009 0 1 0.5 0 1 1
Created on 2023-02-21 with reprex v2.0.2