Search code examples
rcombinationscombinatoricscombn

combn-like task for combinations as elements in R


not sure if I chose a good title at all... Also I don't know if I use the right terminology, so maybe with the right search terms I would find a solution to this problem...

I have a list of strings from which I want to have all sets of "exclusive" combinations of 3.

Example: With the following

require(utils)
mylist<-c("strA","strB","strC","strD","strE","strF")
t(combn(mylist,3))

I get a table that lists all possible combinations of 3 out of those 6 strings (so each row represents one combination of 3):

        [,1]   [,2]   [,3]  
   [1,] "strA" "strB" "strC"
   [2,] "strA" "strB" "strD"
   [3,] "strA" "strB" "strE"
   [4,] "strA" "strB" "strF"
   [5,] "strA" "strC" "strD"
   [6,] "strA" "strC" "strE"
   [7,] "strA" "strC" "strF"
   [8,] "strA" "strD" "strE"
   [9,] "strA" "strD" "strF"
  [10,] "strA" "strE" "strF"
  [11,] "strB" "strC" "strD"
  [12,] "strB" "strC" "strE"
  [13,] "strB" "strC" "strF"
  [14,] "strB" "strD" "strE"
  [15,] "strB" "strD" "strF"
  [16,] "strB" "strE" "strF"
  [17,] "strC" "strD" "strE"
  [18,] "strC" "strD" "strF"
  [19,] "strC" "strE" "strF"
  [20,] "strD" "strE" "strF"

but I would like to have all sets of combinations of 3, in which each string only appears once. So my desired output would look sth like this:

$1
        [,1]   [,2]   [,3]  
   [1,] "strA" "strB" "strC"
   [2,] "strD" "srtE" "strF"
$2
        [,1]   [,2]   [,3]  
   [1,] "strA" "strB" "strD"
   [1,] "strC" "strE" "strF"
$3
        [,1]   [,2]   [,3]  
   [1,] "strA" "strB" "strE"
   [1,] "strC" "strD" "strF"
...

So here each subelement ($1, $2, $3, etc.) contains 2 combinations of 3 strings (as 2*3=6; with 6 strings). In each set each string must not appear more than once.

Of course it would be great if this would be also possible for lengths of mylist that are not a multiple of n=3. In case we have let's say 10 strings (with the addition of "strG", "strH", "strI", and "strJ"), I would like to have one string ommitted in each combination. So the desired result would be sth like

$1
        [,1]   [,2]   [,3]  
   [1,] "strA" "strB" "strC"
   [2,] "strD" "strE" "strF"
   [3,] "strG" "strH" "strI"
$2
        [,1]   [,2]   [,3]  
   [1,] "strA" "strB" "strC"
   [2,] "strD" "strE" "strF"
   [3,] "strG" "strH" "strJ"
$3
        [,1]   [,2]   [,3]  
   [1,] "strA" "strB" "strC"
   [2,] "strD" "strE" "strF"
   [3,] "strG" "strI" "strJ"
$4
        [,1]   [,2]   [,3]  
   [1,] "strA" "strB" "strC"
   [2,] "strE" "strF" "strG"
   [3,] "strH" "strI" "strJ"
...

Does someone have a solution to this? If my explanation was unclear, please let me know.

Cheers


Solution

  • Will assume the transposed combo matrix is named mat. Check to see if there is any overlap with length applied to results of intersect function:

     res <- list();
     for (i in 1:nrow(mat) ){
        for( j in 1:nrow(mat)){  
              if( !length(intersect(mat[i,] , mat[j,])) ) 
                   res[[paste(i,j,sep="_")]] <- rbind( mat[i,], mat[j, ]) } }
    
    
    > res
    $`1_20`
         [,1]   [,2]   [,3]  
    [1,] "strA" "strB" "strC"
    [2,] "strD" "strE" "strF"
    
    $`2_19`
         [,1]   [,2]   [,3]  
    [1,] "strA" "strB" "strD"
    [2,] "strC" "strE" "strF"
    
    $`3_18`
         [,1]   [,2]   [,3]  
    [1,] "strA" "strB" "strE"
    [2,] "strC" "strD" "strF"
    
    .... snipped
    

    Depending on your definition of "unique" you might decide to only take the first ten items since half of these are transpositions of rows:

    > res[[1]]
         [,1]   [,2]   [,3]  
    [1,] "strA" "strB" "strC"
    [2,] "strD" "strE" "strF"
    > res[[20]]
         [,1]   [,2]   [,3]  
    [1,] "strD" "strE" "strF"
    [2,] "strA" "strB" "strC"