Search code examples
rsortingswap

Counting the number of swaps in `base::sort` in R


The base::sort function return the ascending (descending) order of an unordered vector.

X <- c(3,4,2,5,1)
sort(X)
[1] 1 2 3 4 5

Is there any way to count the number of swaps performed by the function to obtain the ordered vector?


Solution

  • The stringdist package can do this kind of calculation on strings. So if you have a small enough number of symbols, you could use that. For example:

    X <- c(3,4,2,5,1)
    Y <- sort(X)
    
    # Convert to strings
    tostring <- function(x, symbols = sort(unique(x))) {
      alphabet <- c(letters, LETTERS)
      if (length(symbols) > length(alphabet))
        stop("You need a bigger alphabet!")
      if (any(! x %in% symbols))
        stop("x has unknown symbols!")
      paste(alphabet[match(x, symbols)], collapse="")
    }
    
    symboldist <- function(x, y, method = "osa") {
      symbols <- unique(c(x, y))
      stringdist::stringdist(tostring(x, symbols), tostring(y, symbols), 
                             method = method)
    }
    
    symboldist(X, Y)
    #> [1] 4
    

    Created on 2021-08-08 by the reprex package (v2.0.0)

    Edited to add: It looks as though the "transposition distance" is what you want. Here's an implementation based on a Javascript one found at https://www.geeksforgeeks.org/number-of-transpositions-in-a-permutation/:

    # Translation of Javascript transposition distance calculation
    # from https://www.geeksforgeeks.org/number-of-transpositions-in-a-permutation/
    
    transpositionDistance <- function(P) {
      dfs <- function(i) {
        result <- 0
        while (!visited[i]) {
          visited[i] <<- TRUE
          i <- goesTo[i]
          result <- result + 1
        }
        result
      }
      
      # Convert P into a permutation of 1:n
      P <- match(P, sort(P))
      
      n <- length(P)
      if (!all(seq_len(n) %in% P))
        stop("this only works on permutations of unique elements")
      visited <- logical(n)
      goesTo <- integer(n)
      
      goesTo[P] <- seq_along(P)
      transpositions <- 0
      for (i in seq_len(n))
        if (!visited[i])
          transpositions <- transpositions + dfs(i) - 1
      transpositions
    }
    
    transpositionDistance(c(3,4,2,5,1))
    #> [1] 4
    transpositionDistance(c(1,3,5,2,4))
    #> [1] 3
    transpositionDistance(c(1,4,3,2,5))
    #> [1] 1