Say I have a vector vec <- c("H", "H", "H", "H", "M", "M", "A", "A")
How do I get all combinations / permutations if I e.g. draw 5 out of 8 with the expetced ouput.
> head(t, 6)
[,1] [,2] [,3] [,4] [,5]
[1,] "H" "H" "H" "H" "M"
[2,] "H" "H" "H" "H" "M"
[3,] "H" "H" "H" "H" "A"
[4,] "H" "H" "H" "H" "A"
[5,] "H" "H" "H" "M" "M"
[6,] "H" "H" "H" "M" "A"
I tried gtools::combinations()
but I always get the error that there are too few different elements (same is true for gtools::permutations()
regardless if repeats are allowed or not.
So I did it in a laborious way
t <- gtools::combinations(8, 5, vec, repeats.allowed = F)
Error in gtools::combinations(8, 5, vec, repeats.allowed = F) :
too few different elements
t <- gtools::combinations(8, 5, letters[1:8], repeats.allowed = F)
for ( i in 1:8) {
if ( i <=4 ) {
t[t == letters[i]] <- "H"
} else if (i <= 6) {
t[t == letters[i]] <- "M"
} else if (i <= 8) {
t[t == letters[i]] <- "A"
}
}
I am looking for an easier solution from any package or base R and want to know, why it doesn't work. Thanks in advance.
When you need combinations/permutations of a vector that contains repeats, or multisets, many of the available functions in base R
and other packages will produce unnecessary duplicate results that eventually need to be filtered out. For smaller problems, this is not an issue, however this approach quickly becomes impractical.
Currently, there are a couple of packages capable of handling these types of problems. They are arrangements
and RcppAlgos
(I am the author).
vec <- c("H", "H", "H", "H", "M", "M", "A", "A")
tbl_v <- table(vec)
tbl_v
vec
A H M
2 4 2
library(RcppAlgos)
comboGeneral(names(tbl_v), 5, freqs = tbl_v)
[,1] [,2] [,3] [,4] [,5]
[1,] "A" "A" "H" "H" "H"
[2,] "A" "A" "H" "H" "M"
[3,] "A" "A" "H" "M" "M"
[4,] "A" "H" "H" "H" "H"
[5,] "A" "H" "H" "H" "M"
[6,] "A" "H" "H" "M" "M"
[7,] "H" "H" "H" "H" "M"
[8,] "H" "H" "H" "M" "M"
## For package arrangements we have:
## arrangements::combinations(names(tbl_v), 5, freq = tbl_v)
Similarly, for permutations, we have:
permuteGeneral(names(tbl_v), 5, freqs = tbl_v)
[,1] [,2] [,3] [,4] [,5]
[1,] "A" "A" "H" "H" "H"
[2,] "A" "A" "H" "H" "M"
[3,] "A" "A" "H" "M" "H"
[4,] "A" "A" "H" "M" "M"
. . . . . .
. . . . . .
. . . . . .
[137,] "M" "M" "H" "A" "A"
[138,] "M" "M" "H" "A" "H"
[139,] "M" "M" "H" "H" "A"
[140,] "M" "M" "H" "H" "H"
## For package arrangements we have:
## arrangements::permutations(names(tbl_v), 5, freq = tbl_v)
Both packages contain algorithms that generate each result without the need for filtering. This approach is much more efficient.
For example, what if we had big_vec <- rep(vec, 8)
and we wanted all combinations of length 16. Using the filtering approach, one would need to generate all combinations of a vector of length 64 choose 16 and then filter them. That is choose(64, 16) = 4.885269e+14
total combinations. That's going to be difficult.
With these two packages, this problem is a breeze.
big_vec <- rep(vec, 8)
tbl_big_v <- table(big_vec)
tbl_big_v
big_vec
A H M
16 32 16
system.time(test_big <- comboGeneral(names(tbl_big_v), 16,
freqs = tbl_big_v))
user system elapsed
0 0 0
dim(test_big)
[1] 153 16