R seems to support an efficient NA
value in floating point arrays. How does it represent it internally?
My (perhaps flawed) understanding is that modern CPUs can carry out floating point calculations in hardware, including efficient handling of Inf, -Inf and NaN values. How does NA
fit into this, and how is it implemented without compromising performance?
With IEEE floats +Inf
and -Inf
is represented with all bits in the exponent (second till 13. bit) set to one and all bits in the mantissa set to zero, whereas NaN has a non-zero mantissa. R uses different values for the mantissa to represent NaN
as well as NA_real_
. We can use a simple C++ function to make this explicit:
Rcpp::cppFunction('void print_hex(double x) {
uint64_t y;
static_assert(sizeof x == sizeof y, "Size does not match!");
std::memcpy(&y, &x, sizeof y);
Rcpp::Rcout << std::hex << y << std::endl;
}', plugins = "cpp11", includes = "#include <cstdint>")
print_hex(NA_real_)
#> 7ff00000000007a2
print_hex(NaN)
#> 7ff8000000000000
print_hex(Inf)
#> 7ff0000000000000
print_hex(-Inf)
#> fff0000000000000
Here some source code references.