X_hat is an approximation of X. It is given by the X_t, the truncation of X using t bits.
Find the limits of X_t / X.
X_hat and X_t are represented in floating-point binary. From my understand:
If t = 3 and X_hat = 1, X_t = 1.01
1/1 = 1 is the upper limit? What about the lower limit?
The following assumes the question is about significant bits (for an example in decimal, 3141592
truncated to 3 significant digits would be 3140000
, and truncated to 5 significant digits would be 3141500
). Since the question is about the relative error, it does not matter whether or where there is a decimal point, so it can be assumed without loss of generality that the numbers are integers.
If X = 0
then X̂ = X = 0
and X̂ / X
is undefined.
Otherwise, if 0 < X < 2^(t-1)
then X
has at most t-1
significant bits, and truncation leaves X
unchanged, so X̂ = X
and X̂ / X = 1
.
Otherwise, if X >= 2^(t-1)
then X
can be written as X = 2^n q + r
where n >= 0
, 2^(t-1) <= q < 2^t
and 0 <= r < 2^n
. The leftmost t
bits of X
are q
, so the truncation of X
to t
significant bits is X̂ = 2^n q
.
Then X̂ / X = 2^n q / (2^n q + r) = 1 - 1 / (1 + r / (2^n q))
. The expression is decreasing in r
and increasing in q
, which combined with r < 2^n
and q >= 2^(t-1)
gives the lower bound:
X̂ / X > 2^(n+t-1) / (2^(n+t-1) + 2^n)
= 1 - 1 / (1 + 2^(t-1))
For a fixed n > 0
the exact lower bound derived from r <= 2^n - 1
and q >= 2^(t-1)
is:
X̂ / X >= 2^(n+t-1) / (2^(n+t-1) + 2^n - 1)
= 1 - (2^n - 1) / (2^(n+t-1) + 2^n - 1)
= 1 - 1 / (1 + 2^(n+t-1) / (2^n - 1))
= 1 - 1 / (1 + 2^(t-1) / (1 - 1 / 2^n))
This lower bound is attained for X = 2^(n+t-1) + 2^n - 1
corresponding to X̂ = 2^(n+t-1)
. In the limit n → ∞
it reduces to the lower bound independent of n
derived at the previous step.