Search code examples
rnumber-formatting

Why is formatC() function introducing NA by coercion on some specific values?


I'm trying to format numbers so they have a fixed width introducing leading zeros were needed. Following this this answer to a related question I'm using the formatC function to achieve this. But I am getting unexpected results.

For instance, this code works as expected:

formatC(2102040015, format = "d", width = 10, flag = "0")
## [1] "2102040015"
formatC(102040015, format = "d", width = 10, flag = "0")
## [1] "0102040015"

But when I try to use the very same approach with these numbers I get the strange result:

formatC(2152040015, format = "d", width = 10, flag = "0")
## Warning message:
## In storage.mode(x) <- "integer" :
##  NAs introduced by coercion to integer range
## [1] "        NA"
formatC(2522040015, format = "d", width = 10, flag = "0")
## Warning message:
## In storage.mode(x) <- "integer" :
##  NAs introduced by coercion to integer range
## [1] "        NA"

After some testing, I have come to the conclusion that for every number greater than 2150000000 I get this message and the " NA" result. I would appreciate if you gave me insights about this behavior. Thank you in advance!


Solution

  • Where you use format="d" you are telling R that you will be formatting integers specifically. The largest integer R can store is .Machine$integer.max which usually is

    .Machine$integer.max
    # [1] 2147483647
    

    Numbers over that amount are stored as floating point numbers. So perhaps you would like to use this instead:

    formatC(2152040015, format = "f", width = 10, flag = "0", digits = 0)