I want to build a frequency table for the rows of a data frame.
I have found how to do it but taking in consideration the order of the columns. I wish to find the frequencies ignoring the columns order.
As an example for:
0 A B
1 B A
2 C D
3 D C
4 C D
I wish to obtain:
A B 2
C D 3
Thanks in advance.
We can use pmin/pmax
to create the grouping variable and should be more efficient
library(dplyr)
df %>%
count(V2N = pmin(V2, V3), V3N = pmax(V2, V3))
# A tibble: 2 x 3
# V2N V3N n
# <chr> <chr> <int>
#1 A B 2
#2 C D 3
df1 <- df[rep(seq_len(nrow(df)), 1e6),]
system.time({
df1 %>%
count(V2N = pmin(V2, V3), V3N = pmax(V2, V3))
})
#user system elapsed
# 1.164 0.043 1.203
system.time({
df2 <- data.frame(t(apply(df1[-1], 1, sort)))
df2 %>%
group_by_all() %>%
summarise(Freq = n())
})
# user system elapsed
# 160.357 1.227 161.544
df <- structure(list(V1 = 0:4, V2 = c("A", "B", "C", "D", "C"), V3 = c("B",
"A", "D", "C", "D")), row.names = c(NA, -5L), class = "data.frame")