I have a data frame that contains measurements for multiple groups. I want to apply outer
to each pair of groups and compute a summary. Can it be done in Tidyverse style? Below is an example of how to do it with two loops. I have played around with expand_grid
and group_by
, but without success.
df <- iris
groups = as.character(levels(df$Species))
n <- length(groups)
out <- matrix(numeric(n*n), nrow = n, ncol = n)
colnames(out) <- groups
rownames(out) <- groups
for (sp1 in 1:n) {
for (sp2 in 1:n) {
out[sp1, sp2] <- sum(outer(
unlist(subset(df, Species == groups[sp1], Sepal.Length)),
unlist(subset(df, Species == groups[sp2], Sepal.Length)),
">"))
}
}
out
setosa versicolor virginica
setosa 1128 139 34
versicolor 2302 1169 479
virginica 2457 1927 1165
1) Base R With these sorts of calculations base R is probably more appropriate. We can split Sepal.Length
into a 3 element list s
, one component per Species, and then use outer
or a double sapply
.
s <- with(iris, split(Sepal.Length, Species))
outer(s, s, Vectorize(\(x, y) sum(outer(y, x, "<"))))
## setosa versicolor virginica
## setosa 1128 139 34
## versicolor 2302 1169 479
## virginica 2457 1927 1165
# alternately replace the line with outer with this
sapply(s, \(u) sapply(s, \(v) sum(outer(u, v, "<"))))
2) magrittr We could make this into a magrittr pipeline like this:
library(magrittr)
iris %>%
with(split(Sepal.Length, Species)) %>%
outer(., ., Vectorize(\(x, y) sum(outer(y, x, "<"))))
3) tidyverse or to go full tidyverse
library(dplyr)
library(tidyr)
iris %>%
select(Sepal.Length, Species) %>%
expand_grid(a = ., b = .) %>%
unnest(c(a, b), names_sep = ".") %>%
summarize(value = sum(a.Sepal.Length < b.Sepal.Length),
.by = ends_with("Species")) %>%
pivot_wider(id_cols = b.Species, names_from = a.Species) %>%
rename(Species = b.Species)
## # A tibble: 3 × 4
## Species setosa versicolor virginica
## <fct> <int> <int> <int>
## 1 setosa 1128 139 34
## 2 versicolor 2302 1169 479
## 3 virginica 2457 1927 1165
4) tapply Another base R approach is via tapply
like this:
with(iris, tapply(kronecker(Sepal.Length, Sepal.Length, "<"),
expand.grid(Species = Species, Species = Species), sum))
## Species
## Species setosa versicolor virginica
## setosa 1128 139 34
## versicolor 2302 1169 479
## virginica 2457 1927 1165