Search code examples
rperformancefactorial

R Programming Efficiency - factorial decomposition of a number


I am trying to solve a kata from codewars called factorial decomposition in R. The aim of the kata is to decompose n! (factorial n) into its prime factors. The function should return a string like

decomp(12) -> "2^10 * 3^5 * 5^2 * 7 * 11"

I was able to solve it, but reached a server timeout (passing 74 assignments). I triet to optimize it a bit (lapply of pointwise()), but the essential core (for-while-loop) I was not able to alter.

Any help would be appreciated, as I put already more time into it than I should have.

##' A function for a factorial decomposition of a number
##' @title decomp
##' @param n integer
##' @return a String with the factorial decomposition
##' @author krisselack

decomp <- function(n) {

  # https://stackoverflow.com/questions/19767408/prime-number-function-in-r
  is.prime <- function(n) n == 2L || all(n %% 2L:ceiling(sqrt(n)) != 0)

  p <- 2:n
  primes <- p[as.logical(vapply(p, is.prime, 1))]
  erg <- NULL

  pointwise <- function(x) {

    primloop <- primes[primes<=x]

    for(j in primloop){

      while(x %% j == 0){
        x <- x/j
        erg <- c(erg, j)
      }
    }

    if(length(erg)>0)
      return(erg)
  }

  erg2 <- unlist(lapply(p, pointwise))

  ergfin <- table(erg2)

  namen <- paste(ifelse(ergfin>1, paste0(names(ergfin), "^", ergfin),
                        paste(names(ergfin))),
                 collapse = " * ")

  return(namen)
}

decomp(5) # -> "2^3 * 3 * 5"
decomp(12) # -> "2^10 * 3^5 * 5^2 * 7 * 11"
decomp(17) # -> "2^15 * 3^6 * 5^3 * 7^2 * 11 * 13 * 17"
decomp(25) # -> "2^22 * 3^10 * 5^6 * 7^3 * 11^2 * 13 * 17 * 19 * 23"

Solution

  • library("purrr")
    
    # https://stackoverflow.com/questions/19767408/prime-number-function-in-r
    is.prime <- function(n) n == 2L || all(n %% 2L:ceiling(sqrt(n)) != 0)
    
    #' Multiplicity of prime p in the decomp of n!
    #' @param p A prime
    #' @param n An integer
    multiplicity_in_factorial <- function(p, n) {
      # Adding epsilon to avoid rounding errors.
      # For example at p = 3, n = 243
      max_mul <- floor(log(n) / log(p) + 0.0001)
      prime_mul <- p ^ (1:max_mul)
      how_many_of_each <- map_dbl(prime_mul, ~ floor(n / .))
      sum(how_many_of_each)
    }
    
    
    
    decomp2 <- function(n) {
      p <- 2:n
      primes <- p[as.logical(vapply(p, is.prime, 1))]
    
      primes_mul <- map_dbl(primes, multiplicity_in_factorial, n)
    
      namen <- paste(ifelse(primes_mul > 1,
                            paste0(primes, "^", primes_mul),
                            primes),
                     collapse = " * ")
    
      return(namen)
    }
    
    check <- function(n) {
      decomp(n) == decomp2(n)
    }
    

    The idea is to loop on primes below n and figure out how often they appear in the factorial.

    The key is that the multiplicity of p in n! is the sum of the multiplicities of p in k for k = 1..n.

    To illustrate, n = 100 and p = 2. There are 50 multiples of 2 between 1 and 100. But this does not account for factors with multiplicity > 1.

    We also have to consider the multiples of 4 (there are 25), of 8 (there are 12), of 16 (there are 6), of 32 (there are 3) and of 64 (there is 1).

    This is what happens in multiplicity in factorial. The rest is straightforward.

    The bottleneck on high values is the computation of primes which could be improved by using the sieve of Eratosthenes.

    # https://gist.github.com/seankross/5946396
    microbenchmark::microbenchmark(
       sieve = sieveOfEratosthenes(N),
       naive_filter = {
         p <- 2:N
         primes <- p[as.logical(vapply(p, is.prime, 1))]
       }
     )
    
    Unit: microseconds
              expr      min        lq      mean    median       uq      max neval
             sieve  395.010  405.4015  423.2184  410.8445  439.629   584.71   100
      naive_filter 2875.782 2936.5195 3268.4576 2979.4925 3016.060 16875.81   100
    

    But I wouldn't bother: the real bottleneck will be the string pasting which is notoriously slow.

    On my laptop decomp2(10e5) takes a few seconds and decomp2(10e6) takes around 2 minutes. I am 99% certain the string pasting is actually the bottleneck in that case.