Search code examples
rperformancetimeexponentiationcoding-efficiency

Why does exponentiation (e.g., 10^6) take 4 times longer than calculator notation (e.g., 1e6) in R?


Using the scientific notation 10^6 in an R code (as I customarily do) results in a significantly longer computing time than using the calculator representation 1e6:

> system.time(for (t in 1:1e7) x=10^6) 
  utilisateur     système      écoulé 
        4.792       0.000       4.281 
> system.time(for (t in 1:1e7) x=1e6) 
 utilisateur     système      écoulé 
       0.804       0.000       1.051
> system.time(for (t in 1:1e7) x=exp(6*log(10)))
 utilisateur     système      écoulé 
       6.301       0.000       5.702

Why is it the case that R recomputes 10^6 in about the same times as it computes exp{6*log(10)}? I understand the fact that R executes a function when computing 10^6, but why was it coded this way?


Solution

  • It's because 1e6 is a constant, and is recognized as such by the parser, whereas 10^6 is parsed as a function call which has to be further evaluated (by a call to the function ^()). Since the former avoids the expensive overhead of a function call, evaluating it is a lot faster!

    class(substitute(1e6))
    # [1] "numeric"
    class(substitute(10^6))
    # [1] "call"
    

    To better see that it's a call, you can dissect it like this:

    as.list(substitute(10^6))
    # [[1]]
    # `^`
    # 
    # [[2]]
    # [1] 10
    # 
    # [[3]]
    # [1] 6
    

    A few other interesting cases:

    ## negative numbers are actually parsed as function calls
    class(substitute(-1))
    [1] "call"
    
    ## when you want an integer, 'L' notation lets you avoid a function call 
    class(substitute(1000000L))
    # [1] "integer"
    class(substitute(as.integer(1000000)))
    # [1] "call"