Search code examples
julia

Why does Float16(1.1)-Float16(1)=Float16(0.0996)?


I am trying to compute by hand Float16(1.1)-Float16(1). Since Float16 allows 10 bits for the fraction, when I do the subtraction by hand with 10 bits precision I do not get 0.0996.

Can someone step me through how to do this subtraction using the bits?


Solution

  • It seems the comments have already answered your question, but for anyone else it may be useful to add that you can easily inspect the actual bit patterns with bitstring to see what is going on in such cases:

    julia> bitstring(Float16(1.1))
    "0011110001100110"
    
    julia> bitstring(Float16(1.0))
    "0011110000000000"
    
    julia> bitstring(Float16(1.1)-Float16(1.0))
    "0010111001100000"
    

    where the bits are divided as follows

    IEEE 754 half-precision format

    For reference, the "implicit bit" referenced in the comments is explained here

    The format is assumed to have an implicit lead bit with value 1 unless the exponent field is stored with all zeros. Thus only 10 bits of the significand appear in the memory format but the total precision is 11 bits.