Search code examples
rmathfactorialarithmetic-expressions

In R why is factorial(100) displayed differently to prod(1:100)?


In R I am finding some odd behaviour that I can't explain and I am hoping someone here can. I believe that the value of 100! is this big number.

A few lines from the console showing expected behaviour...

>factorial( 10 )
[1] 3628800
>prod( 1:10 )
[1] 3628800
> prod( as.double(1:10) )
[1] 3628800
> cumprod( 1:10 )
[1]       1       2       6      24     120     720    5040   40320  362880 3628800

However when I try 100! I get (notice how the resulting numbers begin to differ about 14 digits in):

> options(scipen=200) #set so the whole number shows in the output
> factorial(100)
[1] 93326215443942248650123855988187884417589065162466533279019703073787172439798159584162769794613566466294295348586598751018383869128892469242002299597101203456
> prod(1:100)
[1] 93326215443944102188325606108575267240944254854960571509166910400407995064242937148632694030450512898042989296944474898258737204311236641477561877016501813248
> prod( as.double(1:100) )
[1] 93326215443944150965646704795953882578400970373184098831012889540582227238570431295066113089288327277825849664006524270554535976289719382852181865895959724032
> all.equal( prod(1:100) , factorial(100) , prod( as.double(1:100) ) )
[1] TRUE

If I do some tests against a variable set to the 'known' number of 100! then I see the following:

# This is (as far as I know) the 'true' value of 100!
> n<- as.double(93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000)
> factorial(100) - n
[1] -1902315522848807765998160811905210717565551993186466795054798772271710903343294674760811531554315419925519536152107160826913610179566298858520576
> prod(1:100) - n
[1] -48777321098687378615337456715518223527321845979140174232174327494146433419058837814379782860367062049372295798771978482741374619988879457910784
> prod(as.double(1:100)) - n
[1] 0

The final result evaluates to zero, but the number returned for prod( as.double( 1:100 ) ) does not display as I would expect, even though it correctly evaluates prod( as.double( 1:100 ) ) - n where n is a variable set to the value of 100!.

Can anyone explain this behaviour to me please? It should not be related to overflow etc as far as I am aware, as I am using a x64 system. Version and machine info below:

> .Machine$double.xmax
[1] 1.798e+308
> str( R.Version() )
List of 14
 $ platform      : chr "x86_64-apple-darwin9.8.0"
 $ arch          : chr "x86_64"
 $ os            : chr "darwin9.8.0"
 $ system        : chr "x86_64, darwin9.8.0"
 $ status        : chr ""
 $ major         : chr "2"
 $ minor         : chr "15.2"
 $ year          : chr "2012"
 $ month         : chr "10"
 $ day           : chr "26"
 $ svn rev       : chr "61015"
 $ language      : chr "R"
 $ version.string: chr "R version 2.15.2 (2012-10-26)"
 $ nickname      : chr "Trick or Treat"

Can anyone explain this to me? I don't doubt that R does everything correctly and this is most likely useR related. You might point out that since prod( as.double( 1:100 ) ) - n evaluates correctly what am I bothered about, but I am doing Project Euler Problem 20 so I needed the correct digits displayed.

Thanks


Solution

  • Your test with all.equal does not produce what you expect. all.equal can only compare two values. The third argument is positionally matched to tolerance, which gives the tolerance of the comparison operation. In your invocation to all.equal you give it a tolerance of 100! which definitely leads to the comparison being true for absurdly differing values:

    > all.equal( 0, 1000000000, prod(as.double(1:100)) )
    [1] TRUE
    

    But even if you give it two arguments only, e.g.

    all.equal( prod(1:100), factorial(100) )
    

    it would still produce TRUE because the default tolerance is .Machine$double.eps ^ 0.5, e.g. the two operands have to match to about 8 digits which is definitely the case. On the other hand, if you set the tolerance to 0, then neither three possible combinations emerge equal from the comparison:

    > all.equal( prod(1:100), factorial(100), tolerance=0.0 )
    [1] "Mean relative difference: 1.986085e-14"
    > all.equal( prod(1:100), prod( as.double(1:100) ), tolerance=0.0 )
    [1] "Mean relative difference: 5.22654e-16"
    > all.equal( prod(as.double(1:100)), factorial(100), tolerance=0.0 )
    [1] "Mean relative difference: 2.038351e-14"
    

    Also note that just because you've told R to print 200 significant numbers doesn't mean that they are all correct. Indeed, 1/2^53 has about 53 decimal digits but only the first 16 are considered meaningful.

    This also makes your comparison to the "true" value flawed. Observe this. The ending digits in what R gives you for factorial(100) are:

    ...01203456
    

    You subtract n from it, where n is the "true" value of 100! so it should have 24 zeroes at the end and hence the difference should also end with the same digits that factorial(100) does. But rather it ends with:

    ...58520576
    

    This only shows that all those digits are non-significant and one should not really look into their value.

    It takes 525 bits of binary precision in order to exactly represent 100! - that's 10x the precision of double.