Search code examples
c++memoryfloating-pointprecisionieee-754

Floating point types representation


Is std::numeric_limits<float>::is_iec559 + std::numeric_limits<float>::digits == 24 enough to ensure(1) that float is binary32(2) in IEEE 754 ? Same for double with ... digits == 53 ?

  1. In any case including the weirdest implementations still respecting the C++ standard.
  2. "binary32" is a specific representation of floating points in the IEEE 754 standard, I don't mean "stored in 32 bits".

Edit : + std::numeric_limits<float>::max_exponent - 1 == 127

Edit : Are there any other ways ? If yes, which one is "the best" ?


Solution

  • You can use traits class to check your representation matches some expectations.

    Here are the traits used to test your representation:

    namespace num {
        template <std::size_t N> struct ieee754_traits;
        
        template <> struct ieee754_traits<4> {
          using unsigned_type = uint32_t;
          static constexpr std::size_t sign_size = 1;
          static constexpr std::size_t exp_size = 8;
          static constexpr std::size_t mant_size = 23;
          static constexpr std::size_t exp_shift = 127;
          static constexpr int32_t exp_mask = 0xFF;
          static constexpr unsigned_type mant_mask = 0x7FFFFF;
        };
        
        template <> struct ieee754_traits<8> {
          using unsigned_type = uint64_t;
          static constexpr std::size_t sign_size = 1;
          static constexpr std::size_t exp_size = 11;
          static constexpr std::size_t mant_size = 52;
          static constexpr std::size_t exp_shift = 1023;
          static constexpr int32_t exp_mask = 0x7FF;
          static constexpr unsigned_type mant_mask = 0xFFFFFFFFFFFFF;
        };
    
        template<typename T>
        constexpr bool check_ieee754() {
            // add more check here
            return std::numeric_limits<T>::digits == (num::ieee754_traits<sizeof(T)>::mant_size + 1) &&
               std::numeric_limits<T>::max_exponent == (num::ieee754_traits<sizeof(T)>::exp_mask - num::ieee754_traits<sizeof(T)>::exp_shift);
        }
    }
    

    Then, you can check your representation:

    static_assert(sizeof(float) == 4 && sizeof(double) == 8);
    static_assert(num::check_ieee754<float>(), "does not match ieee754");
    static_assert(num::check_ieee754<double>(), "does not match ieee754");