not sure if I chose a good title at all... Also I don't know if I use the right terminology, so maybe with the right search terms I would find a solution to this problem...
I have a list of strings from which I want to have all sets of "exclusive" combinations of 3.
Example: With the following
require(utils)
mylist<-c("strA","strB","strC","strD","strE","strF")
t(combn(mylist,3))
I get a table that lists all possible combinations of 3 out of those 6 strings (so each row represents one combination of 3):
[,1] [,2] [,3]
[1,] "strA" "strB" "strC"
[2,] "strA" "strB" "strD"
[3,] "strA" "strB" "strE"
[4,] "strA" "strB" "strF"
[5,] "strA" "strC" "strD"
[6,] "strA" "strC" "strE"
[7,] "strA" "strC" "strF"
[8,] "strA" "strD" "strE"
[9,] "strA" "strD" "strF"
[10,] "strA" "strE" "strF"
[11,] "strB" "strC" "strD"
[12,] "strB" "strC" "strE"
[13,] "strB" "strC" "strF"
[14,] "strB" "strD" "strE"
[15,] "strB" "strD" "strF"
[16,] "strB" "strE" "strF"
[17,] "strC" "strD" "strE"
[18,] "strC" "strD" "strF"
[19,] "strC" "strE" "strF"
[20,] "strD" "strE" "strF"
but I would like to have all sets of combinations of 3, in which each string only appears once. So my desired output would look sth like this:
$1
[,1] [,2] [,3]
[1,] "strA" "strB" "strC"
[2,] "strD" "srtE" "strF"
$2
[,1] [,2] [,3]
[1,] "strA" "strB" "strD"
[1,] "strC" "strE" "strF"
$3
[,1] [,2] [,3]
[1,] "strA" "strB" "strE"
[1,] "strC" "strD" "strF"
...
So here each subelement ($1
, $2
, $3
, etc.) contains 2 combinations of 3 strings (as 2*3=6; with 6 strings). In each set each string must not appear more than once.
Of course it would be great if this would be also possible for lengths of mylist
that are not a multiple of n=3
. In case we have let's say 10 strings (with the addition of "strG", "strH", "strI", and "strJ"), I would like to have one string ommitted in each combination. So the desired result would be sth like
$1
[,1] [,2] [,3]
[1,] "strA" "strB" "strC"
[2,] "strD" "strE" "strF"
[3,] "strG" "strH" "strI"
$2
[,1] [,2] [,3]
[1,] "strA" "strB" "strC"
[2,] "strD" "strE" "strF"
[3,] "strG" "strH" "strJ"
$3
[,1] [,2] [,3]
[1,] "strA" "strB" "strC"
[2,] "strD" "strE" "strF"
[3,] "strG" "strI" "strJ"
$4
[,1] [,2] [,3]
[1,] "strA" "strB" "strC"
[2,] "strE" "strF" "strG"
[3,] "strH" "strI" "strJ"
...
Does someone have a solution to this? If my explanation was unclear, please let me know.
Cheers
Will assume the transposed combo matrix is named mat
. Check to see if there is any overlap with length applied to results of intersect
function:
res <- list();
for (i in 1:nrow(mat) ){
for( j in 1:nrow(mat)){
if( !length(intersect(mat[i,] , mat[j,])) )
res[[paste(i,j,sep="_")]] <- rbind( mat[i,], mat[j, ]) } }
> res
$`1_20`
[,1] [,2] [,3]
[1,] "strA" "strB" "strC"
[2,] "strD" "strE" "strF"
$`2_19`
[,1] [,2] [,3]
[1,] "strA" "strB" "strD"
[2,] "strC" "strE" "strF"
$`3_18`
[,1] [,2] [,3]
[1,] "strA" "strB" "strE"
[2,] "strC" "strD" "strF"
.... snipped
Depending on your definition of "unique" you might decide to only take the first ten items since half of these are transpositions of rows:
> res[[1]]
[,1] [,2] [,3]
[1,] "strA" "strB" "strC"
[2,] "strD" "strE" "strF"
> res[[20]]
[,1] [,2] [,3]
[1,] "strD" "strE" "strF"
[2,] "strA" "strB" "strC"