Search code examples
c++fixed-point

Converting floating point to fixed point


In C++, what's the generic way to convert any floating point value (float) to fixed point (int, 16:16 or 24:8)?

EDIT: For clarification, fixed-point values have two parts to them: an integer part and a fractional part. The integer part can be represented by a signed or unsigned integer data type. The fractional part is represented by an unsigned data integer data type.

Let's make an analogy with money for the sake of clarity. The fractional part may represent cents -- a fractional part of a dollar. The range of the 'cents' data type would be 0 to 99. If a 8-bit unsigned integer were to be used for fixed-point math, then the fractional part would be split into 256 evenly divisible parts.

I hope that clears things up.


Solution

  • Here you go:

    // A signed fixed-point 16:16 class
    class FixedPoint_16_16
    {
        short          intPart;
        unsigned short fracPart;
    
    public:
        FixedPoint_16_16(double d)
        {
            *this = d; // calls operator=
        }
    
        FixedPoint_16_16& operator=(double d)
        {
            intPart = static_cast<short>(d);
            fracPart = static_cast<unsigned short>
                        (numeric_limits<unsigned short> + 1.0)*d);
            return *this;
        }
    
        // Other operators can be defined here
    };
    

    EDIT: Here's a more general class based on anothercommon way to deal with fixed-point numbers (and which KPexEA pointed out):

    template <class BaseType, size_t FracDigits>
    class fixed_point
    {
        const static BaseType factor = 1 << FracDigits;
    
        BaseType data;
    
    public:
        fixed_point(double d)
        {
            *this = d; // calls operator=
        }
    
        fixed_point& operator=(double d)
        {
            data = static_cast<BaseType>(d*factor);
            return *this;
        }
    
        BaseType raw_data() const
        {
            return data;
        }
    
        // Other operators can be defined here
    };
    
    
    fixed_point<int, 8> fp1;           // Will be signed 24:8 (if int is 32-bits)
    fixed_point<unsigned int, 16> fp1; // Will be unsigned 16:16 (if int is 32-bits)