Search code examples
pythonprecisionieee-754

Why does 2x - x == x in IEEE floating point precision?


I would expect this to only hold when the last bit of the mantissa is 0. Otherwise, in order to subtract them (since their exponents differ by 1), x would lose a bit of precision first and the result would either end up being rounded up or down.

But a quick experiment shows that it seems to always hold (assuming x and 2x are finite) for any random number (including those with a a trailing 1 bit).

import random
import struct
from collections import Counter


def float_to_bits(f: float) -> int:
    """
    Convert a double-precision floating-point number to a 64-bit integer.
    """
    # Pack the float into 8 bytes, then unpack as an unsigned 64-bit integer
    return struct.unpack(">Q", struct.pack(">d", f))[0]


def check_floating_point_precision(num_trials: int) -> float:
    true_count = 0
    false_count = 0
    bit_counts = Counter()

    for _ in range(num_trials):
        x = random.uniform(0, 1)
        if 2 * x - x == x:
            true_count += 1
        else:
            false_count += 1

        bits = float_to_bits(x)

        # Extract the last three bits of the mantissa
        last_three_bits = bits & 0b111
        bit_counts[last_three_bits] += 1

    return (bit_counts, true_count / num_trials)


num_trials = 1_000_000
(bit_counts, proportion_true) = check_floating_point_precision(num_trials)

print(f"The proportion of times 2x - x == x holds true: {proportion_true:.6f}")
print("Distribution of last three bits (mod 8):")
for bits_value in range(8):
    print(f"{bits_value:03b}: {bit_counts[bits_value]} occurrences")
The proportion of times 2x - x == x holds true: 1.000000
Distribution of last three bits (mod 8):
000: 312738 occurrences
001: 62542 occurrences
010: 125035 occurrences
011: 62219 occurrences
100: 187848 occurrences
101: 62054 occurrences
110: 125129 occurrences
111: 62435 occurrences

Solution

  • If we had to do arithmetic only in the floating-point format, even for internal values during the arithmetic, then, yes, 2*x - x would not always yield x. For example, with four-bit significands, we could have:

    Expression Value/Calculation
    x 1.001•20 (9)
    2*x 1.001•21 (18)
    2*x - x 1.001•21 − 1.001•20
    = 1.001•21 − 0.100•21 (shifted right operand and lost a bit)
    = 0.101•21 = 1.010•20 (10).

    However, that is not how floating-point implementations work. To get the right results, they use either more digits internally or more sophisticated algorithms or both. IEEE-754 specifies that the result of an elementary operation is the value you would get by computing the exact real-number arithmetic result with no error or rounding or limitations on digits and then rounding that real number to the nearest value representable in the destination format, using the rounding rule in effect for the operation. (Most commonly, that rounding rule is to round to the nearest value, breaking ties in favor of the one with an even low digit in its significand.)

    A consequence of this requirement is that, if the mathematical result is representable in the format, the computed result must be that mathematical result. When the mathematical result is representable, there is never any rounding error in a properly implemented elementary operation.