Search code examples
pythonc++floating-pointdouble

convert python float to 32 bit object


I'm trying to use python to investigate the effect of C++ truncating doubles to floats.

In C++ I have relativistic energies and momenta which are cast to floats, and I'm trying to work out whether at these energies saving them as doubles would actually result in any improved precision in the difference between energy and momentum.

I chose python because it seemed like a quick and easy way to look at this with some test energies, but now I realise that the difference between C++ 32 bit floats and 64 bit doubles isn't reflect in python.

I haven't yet found a simple way of taking a number and reducing the number of bits used to store it in python. Please can someone enlighten me?


Solution

  • I'd suggest using Numpy as well. It exposes various data types including C style floats and doubles.

    Other useful tools are the C++17 style hex encoding and the decimal module for getting accurate decimal expansions.

    For example:

    import numpy as np
    from decimal import Decimal
    
    for ftype in (np.float32, np.float64):
        v = np.exp(ftype(1))
        pyf = float(v)
        print(f"{v.dtype} {v:20} {pyf.hex()} {Decimal(pyf)}")
    

    giving

    float32   2.7182819843292236 0x1.5bf0aa0000000p+1 2.7182819843292236328125
    float64    2.718281828459045 0x1.5bf0a8b145769p+1 2.718281828459045090795598298427648842334747314453125
    

    Unfortunately the hex-encoding is a bit verbose for float32s (i.e. the zeros are redundant), and the decimal module doesn't know about Numpy floats, so you need to convert it to a Python-native float first. But given that binary32's can be directly converted to binary64's this doesn't seem too bad.

    Just thought, that it sounds like you might want these written out to a file. If so, Numpy scalars (and ndarrays) support the buffer protocol, which means you can just write them out or use bytes(v) to get the underlying bytes.