Search code examples
rfractions

R fractions with library(MASS)


I've got a long list of numbers in character format (around 50000 terms) which can be converted to numeric very quickly with "as.numeric":

y = c("-1", "1", "1", ...)

The problem is that I've extended the functionality to include fractions and calling

    y = c("-1/2", "1", "1", ...)
    y = as.numeric(y);

produces an "NAs introduced by coercion" warning message, while calling

 sapply(y , function(x) {

     eval(parse(text=x));
  });

solves the problem, but takes much longer to execute. Is there a better way to do this?


Solution

  • eval(parse(text)) is very slow - as you know what you will be doing, you can write a quicker function:

    y = c("-1/2", "1", "1", "1/2")
    fixnums <- function(x){
      temp <- as.numeric(x)
      temp[is.na(temp)] <- lapply(strsplit(x[is.na(temp)], "/"), function(x) as.numeric(x[1])/as.numeric(x[2]))
      unlist(temp)
    }
    fixnums(y)
    

    A faster version, avoiding the lapply, suggested in the comment below by @DavidArenburg:

    davidfixnums <- function(x){
      temp <- as.numeric(x)
      temp2 <- as.numeric(unlist(strsplit(y[is.na(temp)], "/", fixed = TRUE)))
      temp[is.na(temp)] <- temp2[c(T, F)]/temp2[c(F, T)]
      temp
    }
    

    Some benchmarks, using @akrun and @DavidArenburgs suggestions:

    library(microbenchmark)
    set.seed(1234)
    y <- sample(c("-1/2", "1", "1", "1/2"), 10000, replace = TRUE)
    
    akrunfixnums <- function(y){
      x1 <- as.numeric(y)
      x1[is.na(x1)] <- vapply(y[is.na(x1)], function(x) 
        eval(parse(text=x)), numeric(1))
      x1
    }
    
    microbenchmark(fixnums(y), davidfixnums(y), akrunfixnums(y))
    
    Unit: milliseconds
                expr        min         lq       mean     median        uq       max neval cld
          fixnums(y)  22.643745  23.157345  25.326465  23.435554  23.98544 154.16316   100  b 
     davidfixnums(y)   6.676234   6.778378   6.957626   6.824459   6.93025  10.12763   100 a  
     akrunfixnums(y) 845.404840 858.031737 869.886625 865.255363 875.54351 960.86497   100   c