Search code examples
c++c++11nanostream

std::num_put issue with nan-boxing due to auto-cast from float to double


I'm using this post to extend nan values with some extra info and this post to modify std::cout behaviour and display this extra info.

Here is the code defining the functions and NumPut class:

#include <iostream>
#include <assert.h>
#include <limits>
#include <bitset>
#include <cmath>
#include <locale>
#include <ostream>
#include <sstream>

template <typename T>
void showValue( T val, const std::string& what )
{
    union uT {
      T d;
      unsigned long long u;
    };
    uT ud;
    ud.d = val;
    std::bitset<sizeof(T) * 8> b(ud.u);
    std::cout << val << " (" << what << "): " << b.to_string() << std::endl;
}

template <typename T>
T customizeNaN( T value, char mask )
{
    T res = value;
    char* ptr = (char*) &res;
    assert( ptr[0] == 0 );
    ptr[0] |= mask;
    return res;
}

template <typename T>
bool isCustomNaN( T value, char mask )
{
    char* ptr = (char*) &value;
    return ptr[0] == mask;
}

template <typename T>
char getCustomNaNMask( T value )
{
    char* ptr = (char*) &value;
    return ptr[0];
}

template <typename Iterator = std::ostreambuf_iterator<char> >
class NumPut : public std::num_put<char, Iterator>
{
private:
    using base_type = std::num_put<char, Iterator>;

public:
    using char_type = typename base_type::char_type;
    using iter_type = typename base_type::iter_type;

    NumPut(std::size_t refs = 0)
    :   base_type(refs)
    {}

protected:
    virtual iter_type do_put(iter_type out, std::ios_base& str, char_type fill, double v) const override {
        if(std::isnan(v))
        {
            char mask = getCustomNaNMask(v);
            if ( mask == 0x00 )
            {
                out = std::copy(std::begin(NotANumber), std::end(NotANumber), out);
            }
            else
            {
                std::stringstream maskStr;
                maskStr << "(0x" << std::hex << (unsigned) mask << ")";
                std::string temp = maskStr.str();
                out = std::copy(std::begin(CustomNotANumber), std::end(CustomNotANumber), out);
                out = std::copy(std::begin(temp), std::end(temp), out);
            }
        }
        else
        {
            out = base_type::do_put(out, str, fill, v);
        }
        return out;
    }

private:
    static const std::string NotANumber;
    static const std::string CustomNotANumber;
};

template<typename Iterator> const std::string NumPut<Iterator>::NotANumber = "Not a Number";
template<typename Iterator> const std::string NumPut<Iterator>::CustomNotANumber = "Custom Not a Number";

inline void fixNaNToStream( std::ostream& str )
{
    str.imbue( std::locale(str.getloc(), new NumPut<std::ostreambuf_iterator<char>>() ) );
}

A simple test function:

template<typename T>
void doTest()
{
    T regular_nan = std::numeric_limits<T>::quiet_NaN();
    T myNaN1 = customizeNaN( regular_nan, 0x01 );
    T myNaN2 = customizeNaN( regular_nan, 0x02 );

    showValue( regular_nan, "regular" );
    showValue( myNaN1, "custom 1" );
    showValue( myNaN2, "custom 2" );
}

My main program:

int main(int argc, char *argv[])
{
    fixNaNToStream( std::cout );

    doTest<double>();
    doTest<float>();

    return 0;
}

doTest<double> outputs:

Not a Number (regular): 0111111111111000000000000000000000000000000000000000000000000000
Custom Not a Number(0x1) (custom 1): 0111111111111000000000000000000000000000000000000000000000000001
Custom Not a Number(0x2) (custom 2): 0111111111111000000000000000000000000000000000000000000000000010

doTest<float> outputs:

Not a Number (regular): 01111111110000000000000000000000
Not a Number (custom 1): 01111111110000000000000000000001
Not a Number (custom 2): 01111111110000000000000000000010

While I would expect for float:

Not a Number (regular): 01111111110000000000000000000000
Custom Not a Number(0x1) (custom 1): 01111111110000000000000000000001
Custom Not a Number(0x2) (custom 2): 01111111110000000000000000000010

The problem is that num_put only has a virtual do_put for double, not for float. So my float is silently casted to a double, losing my extended information.

I know there are some alternatives, like using FloatFormat from the second post, or simply writing a smart float2double function and calling it prior to sending my NaN value to the output stream, but they require the developer to take care of this situation...and he may forget to.

Is there no way to implement that within NumPut class or anything else that would simply make things work when a float is send to the imbued stream as nicely as it works for a double?

My requirement is to be able to simply call a function like fixNaNToStream for any output stream (std::cout, local std::stringstream, ...) and then send float and double to it and get them identified as my custom NaNs and displayed accordingly.


Solution

  • The problem is that num_put only has a virtual do_put for double, not for float. So my float is silently casted to a double, losing my extended information.

    The information is lost because the positions of the bits carrying it are different when the number is converted from float to double:

    // Assuming an IEE-754 floating-point representation of float and double
    0 11111111 10000000000000000000010
    0 11111111111 1000000000000000000001000000000000000000000000000000
    

    Note that the mantissa bits are "shifted" by 3 positions, because the exponent requires 3 more bits.

    Also, it's worth noting what it's stated in this page: https://en.cppreference.com/w/cpp/numeric/math/isnan

    Copying a NaN is not required, by IEEE-754, to preserve its bit representation (sign and payload), though most implementation do.

    I assume the same holds for casting such values, so that, even ignoring other causes of undefined behavior in OP's code, whether a method of NaN-boxing could work or not is actually implementation defined.

    In my former attempts of answering this question, I used some explicit bit shifting by different offset to achive the result, but as jpo38 also found out, the easiest way is to always generate a float NaN and then cast correctly.

    The Standard Library function std::nanf could be used to generate a "customized" float NaN, but in the following demo snippet I won't use it.

    #include <cstdint>
    #include <limits>
    #include <cstring>
    #include <cassert>
    #include <type_traits>
    #include <iostream>
    #include <bitset>
    #include <array>
    #include <climits>
    
    namespace my {
    
    // Waiting for C++20 std::bit_cast
    // source: https://en.cppreference.com/w/cpp/numeric/bit_cast
    template <class To, class From>
    typename std::enable_if<
        (sizeof(To) == sizeof(From)) &&
        std::is_trivially_copyable<From>::value &&
        std::is_trivial<To>::value,
        // this implementation requires that To is trivially default constructible
        To>::type
    // constexpr support needs compiler magic
    bit_cast(const From &src) noexcept
    {
        To dst;
        std::memcpy(&dst, &src, sizeof(To));
        return dst;
    }
    
    template <typename T, std::size_t Size = sizeof(T)>
    void print_bits(T x)
    {
        std::array<unsigned char, Size> buf;
        std::memcpy(buf.data(), &x, Size);
        for (auto it = buf.crbegin(); it != buf.crend(); ++it)
        {
            std::bitset<CHAR_BIT> b{*it};
            std::cout << b.to_string();
        }
        std::cout << '\n';
    }
    
    // The following assumes that both floats and doubles store the mantissa
    // in the lower bits and that while casting a NaN (float->double or double->float)
    // the most significant of those aren't changed
    template <typename T>
    auto boxed_nan(uint8_t data = 0) -> typename std::enable_if<std::numeric_limits<T>::has_quiet_NaN, T>::type
    {
        return bit_cast<float>(
            bit_cast<uint32_t>(std::numeric_limits<float>::quiet_NaN()) |
            static_cast<uint32_t>(data)
        );
    }
    
    template <typename T>
    uint8_t unbox_nan(T num)
    {
        return bit_cast<uint32_t>(static_cast<float>(num));
    }
    
    }; // End of namespace 'my'
    
    
    int main()
    {
        auto my_nan = my::boxed_nan<float>(42);
        my::print_bits(my_nan);
        my::print_bits(static_cast<double>(my_nan));
        assert(my::unbox_nan(my_nan) == 42);
        assert(my::unbox_nan(static_cast<double>(my_nan)) == 42);
    
        auto my_d_nan = my::boxed_nan<double>(17);
        my::print_bits(my_d_nan);
        my::print_bits(static_cast<float>(my_d_nan));
        assert(my::unbox_nan(my_d_nan) == 17);
        assert(my::unbox_nan(static_cast<float>(my_d_nan)) == 17);
    
        auto my_ld_nan = my::boxed_nan<long double>(9);
        assert(my::unbox_nan(my_ld_nan) == 9);
        assert(my::unbox_nan(static_cast<double>(my_ld_nan)) == 9);
    }