Search code examples
mathtruncationapproximate

What does finding the limits of a truncated value mean?


X_hat is an approximation of X. It is given by the X_t, the truncation of X using t bits.

Find the limits of X_t / X.

X_hat and X_t are represented in floating-point binary. From my understand:

If t = 3 and X_hat = 1, X_t = 1.01

1/1 = 1 is the upper limit? What about the lower limit?


Solution

  • The following assumes the question is about significant bits (for an example in decimal, 3141592 truncated to 3 significant digits would be 3140000, and truncated to 5 significant digits would be 3141500). Since the question is about the relative error, it does not matter whether or where there is a decimal point, so it can be assumed without loss of generality that the numbers are integers.

    If X = 0 then X̂ = X = 0 and X̂ / X is undefined.

    Otherwise, if 0 < X < 2^(t-1) then X has at most t-1 significant bits, and truncation leaves X unchanged, so X̂ = X and X̂ / X = 1.

    Otherwise, if X >= 2^(t-1) then X can be written as X = 2^n q + r where n >= 0, 2^(t-1) <= q < 2^t and 0 <= r < 2^n. The leftmost t bits of X are q, so the truncation of X to t significant bits is X̂ = 2^n q.

    Then X̂ / X = 2^n q / (2^n q + r) = 1 - 1 / (1 + r / (2^n q)). The expression is decreasing in r and increasing in q, which combined with r < 2^n and q >= 2^(t-1) gives the lower bound:

    X̂ / X  >  2^(n+t-1) / (2^(n+t-1) + 2^n)
           =  1 - 1 / (1 + 2^(t-1))
    

    For a fixed n > 0 the exact lower bound derived from r <= 2^n - 1 and q >= 2^(t-1) is:

    X̂ / X  >=  2^(n+t-1) / (2^(n+t-1) + 2^n - 1)
            =  1 - (2^n - 1) / (2^(n+t-1) + 2^n - 1)
            =  1 - 1 / (1 + 2^(n+t-1) / (2^n - 1))
            =  1 - 1 / (1 + 2^(t-1) / (1 - 1 / 2^n))
    

    This lower bound is attained for X = 2^(n+t-1) + 2^n - 1 corresponding to X̂ = 2^(n+t-1). In the limit n → ∞ it reduces to the lower bound independent of n derived at the previous step.