Search code examples
ralgorithmfunctionranking

R: enumerating possible sequences to break ties in a ranking


  MEMORY1 MEMORY2 MEMORY3 MEMORY4 MEMORY5 MEMORY6 MEMORY7 MEMORY8
        1     2.5     2.5       4       5       6       7       8

Suppose I have the above ranking of 8 items. There are 2 ways to break this tie: 1 2 3 4 5 6 7 8 or 1 3 2 4 5 6 7 8. I am trying to write a function which outputs these two possible sequences when given the original ranking with ties in it.

In the case of

  MEMORY1 MEMORY2 MEMORY3 MEMORY4 MEMORY5 MEMORY6 MEMORY7 MEMORY8
      4.5     4.5     4.5     4.5     4.5     4.5     4.5     4.5

All items are tied, so there's 8! possible sequences. permn(8) or something similar would enumerate the sequences just fine.

In the case of

  MEMORY1 MEMORY2 MEMORY3 MEMORY4 MEMORY5 MEMORY6 MEMORY7 MEMORY8
      7.5       5       5       2     7.5       2       2       5

There are 3! x 3! x 2! = 72 possible sequences. How can I write a function that outputs these 72 possible sequences given the original ranking?

myfun = function(ranking){
  output = vector()
  values = sort(unique(ranking))
  if(length(values) < 8){
    #if there are ties
    for(i in 1:length(values)){
      value_in_question = values[i]
      if(sum(value_in_question %in% values[i] == 1)){
        output = output
      }else output[i] = permn(values[i])
    }
  }
  return(output)
}

This is my attempt, it doesn't work. And I'm having trouble coming up with a way to enumerate the sequences when there are multiple ties...

EDIT:

dat = c(2.5, 2.5, 2.5, 2.5, 6.5, 6.5, 6.5, 6.5)
names(dat) <- paste0("MEMORY", 1:8)

## Group similar items, compute run lengths, then permute
library(combinat) # permn
gs <- cumsum(abs(c(0, diff(sort(dat)))) > 1e-9)
lens <- rle(gs)$lengths
lst <- mapply(function(a,b) lapply(permn(1:b), `+`, a), c(0, cumsum(head(lens, -1))), lens)

## Expand into data.frame (don't expand if all were the same)
res <- if(!is.null(dim(lst)) && dim(lst)[2] == 1) lst else expand.grid(lst)

 Error: cannot allocate vector of size 16.0 Gb
In addition: Warning messages:
1: In rep.int(seq_len(nx), rep.int(rep.fac, nx)) :
  Reached total allocation of 8070Mb: see help(memory.size)
2: In rep.int(seq_len(nx), rep.int(rep.fac, nx)) :
  Reached total allocation of 8070Mb: see help(memory.size)
3: In rep.int(seq_len(nx), rep.int(rep.fac, nx)) :
  Reached total allocation of 8070Mb: see help(memory.size)
4: In rep.int(seq_len(nx), rep.int(rep.fac, nx)) :
  Reached total allocation of 8070Mb: see help(memory.size)
5: In rep.int(seq_len(nx), rep.int(rep.fac, nx)) :
  Reached total allocation of 8070Mb: see help(memory.size)
6: In rep.int(seq_len(nx), rep.int(rep.fac, nx)) :
  Reached total allocation of 8070Mb: see help(memory.size)

Solution

  • Since you are comparing floats, you don't want to use == tests. Instead, check that the difference between numbers is sufficiently small. Here is a possible solution that makes no claims on efficiency.

    ## Example
    dat <- c(7.5, 5, 5, 2, 7.5, 2, 2, 5)
    names(dat) <- paste0("MEMORY", 1:8)
    
    ## Group similar items, compute run lengths, then permute
    library(combinat) # permn
    gs <- cumsum(abs(c(0, diff(sort(dat)))) > 1e-9)
    lens <- rle(gs)$lengths
    lst <- mapply(function(a,b) lapply(permn(1:b), `+`, a),
                  c(0, cumsum(head(lens, -1))), lens, SIMPLIFY = FALSE)
    
    ## Expand into data.frame (don't expand if all were the same)
    res <- if(!is.null(dim(lst)) && dim(lst)[2] == 1) lst else expand.grid(lst)
    
    ## Unnest columns if desired
    res <- data.frame(t(apply(res, 1, unlist)))
    
    ## Name the columns
    names(res) <- names(sort(dat))
    
    head(res)
    #   MEMORY4 MEMORY6 MEMORY7 MEMORY2 MEMORY3 MEMORY8 MEMORY1 MEMORY5
    # 1       1       2       3       4       5       6       7       8
    # 2       1       3       2       4       5       6       7       8
    # 3       3       1       2       4       5       6       7       8
    # 4       3       2       1       4       5       6       7       8
    # 5       2       3       1       4       5       6       7       8
    # 6       2       1       3       4       5       6       7       8
    ## Gets all 72 sequences from example: 3!*3!*2!
    nrow(res)
    # [1] 72
    

    The result should be a data.frame where each row is one of the possible sequences (the sequences being indexes of the sorted data).