I'm looking at how value X changes in grouped data over time (t). X starts at zero for most observations and then changes randomly as time progresses. At each time-point, I'd like to find out how many obeservations from each group have the mode value for that group - ideally excluding the zero values. The data looks something like below but with more, bigger groups and more t columns.
group_name t1 t10 t50 t100
1 s3 0 259 187 122
2 s1 29 25 23 15
3 s3 0 259 23 122
4 s2 0 36 24 15
5 s1 29 25 23 15
6 s2 0 32 24 15
Ultimately, I would like to plot how many observations have the mode value for their respective group as a function of t but I don't know how to crunch the data for this with efficient R code.
I've seen there are several ways to calculate the mode for each group at a single time-point (e.g. here) but I don't know how to adapt these to calculate the number equal to that mode, or which would be the most efficient method to scale up for multiple t columns.
Thanks for any suggestions!
We could group by 'group_name', and summarise
across
the rest of the columns (everything()
) by applying the Mode
function on a subset of rows by excluding the 0 values (.[. != 0]
), create a logical vector (==
) with the elements of the column and get the sum
to find the frequency for each column by the grouping variable
library(dplyr)
df1 %>%
group_by(group_name) %>%
summarise(across(everything(), ~ sum(Mode(.[. !=0]) == ., na.rm = TRUE)))
# A tibble: 3 x 5
# group_name t1 t10 t50 t100
# <chr> <int> <int> <int> <int>
#1 s1 2 2 2 2
#2 s2 0 1 2 2
#3 s3 0 2 1 2
Or using data.table
library(data.table)
setDT(df1)[, lapply(.SD, function(x) sum(Mode(x[x != 0]) == x, na.rm = TRUE)),
by = group_name]
where
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
If we need to calculate across the 't' columns, reshape to 'long' format (pivot_longer
), filter
out the 0 values, grouped by 'group_name', summarise
with the frequency of 'Mode' values
library(tidyr)
df1 %>%
pivot_longer(cols = starts_with('t')) %>%
filter(value != 0) %>%
group_by(group_name) %>%
summarise(n_Mode = sum(Mode(value) == value))