I am trying to solve a kata from codewars called factorial decomposition in R. The aim of the kata is to decompose n! (factorial n) into its prime factors. The function should return a string like
decomp(12) -> "2^10 * 3^5 * 5^2 * 7 * 11"
I was able to solve it, but reached a server timeout (passing 74 assignments). I triet to optimize it a bit (lapply of pointwise()), but the essential core (for-while-loop) I was not able to alter.
Any help would be appreciated, as I put already more time into it than I should have.
##' A function for a factorial decomposition of a number
##' @title decomp
##' @param n integer
##' @return a String with the factorial decomposition
##' @author krisselack
decomp <- function(n) {
# https://stackoverflow.com/questions/19767408/prime-number-function-in-r
is.prime <- function(n) n == 2L || all(n %% 2L:ceiling(sqrt(n)) != 0)
p <- 2:n
primes <- p[as.logical(vapply(p, is.prime, 1))]
erg <- NULL
pointwise <- function(x) {
primloop <- primes[primes<=x]
for(j in primloop){
while(x %% j == 0){
x <- x/j
erg <- c(erg, j)
}
}
if(length(erg)>0)
return(erg)
}
erg2 <- unlist(lapply(p, pointwise))
ergfin <- table(erg2)
namen <- paste(ifelse(ergfin>1, paste0(names(ergfin), "^", ergfin),
paste(names(ergfin))),
collapse = " * ")
return(namen)
}
decomp(5) # -> "2^3 * 3 * 5"
decomp(12) # -> "2^10 * 3^5 * 5^2 * 7 * 11"
decomp(17) # -> "2^15 * 3^6 * 5^3 * 7^2 * 11 * 13 * 17"
decomp(25) # -> "2^22 * 3^10 * 5^6 * 7^3 * 11^2 * 13 * 17 * 19 * 23"
library("purrr")
# https://stackoverflow.com/questions/19767408/prime-number-function-in-r
is.prime <- function(n) n == 2L || all(n %% 2L:ceiling(sqrt(n)) != 0)
#' Multiplicity of prime p in the decomp of n!
#' @param p A prime
#' @param n An integer
multiplicity_in_factorial <- function(p, n) {
# Adding epsilon to avoid rounding errors.
# For example at p = 3, n = 243
max_mul <- floor(log(n) / log(p) + 0.0001)
prime_mul <- p ^ (1:max_mul)
how_many_of_each <- map_dbl(prime_mul, ~ floor(n / .))
sum(how_many_of_each)
}
decomp2 <- function(n) {
p <- 2:n
primes <- p[as.logical(vapply(p, is.prime, 1))]
primes_mul <- map_dbl(primes, multiplicity_in_factorial, n)
namen <- paste(ifelse(primes_mul > 1,
paste0(primes, "^", primes_mul),
primes),
collapse = " * ")
return(namen)
}
check <- function(n) {
decomp(n) == decomp2(n)
}
The idea is to loop on primes below n
and figure out how often they appear in the factorial.
The key is that the multiplicity of p in n! is the sum of the multiplicities of p in k for k = 1..n.
To illustrate, n = 100 and p = 2. There are 50 multiples of 2 between 1 and 100. But this does not account for factors with multiplicity > 1.
We also have to consider the multiples of 4 (there are 25), of 8 (there are 12), of 16 (there are 6), of 32 (there are 3) and of 64 (there is 1).
This is what happens in multiplicity in factorial
. The rest is straightforward.
The bottleneck on high values is the computation of primes
which could be improved by using the sieve of Eratosthenes.
# https://gist.github.com/seankross/5946396
microbenchmark::microbenchmark(
sieve = sieveOfEratosthenes(N),
naive_filter = {
p <- 2:N
primes <- p[as.logical(vapply(p, is.prime, 1))]
}
)
Unit: microseconds
expr min lq mean median uq max neval
sieve 395.010 405.4015 423.2184 410.8445 439.629 584.71 100
naive_filter 2875.782 2936.5195 3268.4576 2979.4925 3016.060 16875.81 100
But I wouldn't bother: the real bottleneck will be the string pasting which is notoriously slow.
On my laptop decomp2(10e5)
takes a few seconds and decomp2(10e6)
takes around 2 minutes. I am 99% certain the string pasting is actually the bottleneck in that case.