I'm working with 64 bits floating point arithmetic as defined by IEEE 754
. The smallest subnormal number is:
2^-1074 = 5e-324 = 5 * 10^-16 * 10^-308
Adding the latter to realmin results in:
2^-1022 + 2^-1074 = 2.2250738585072014 * 10^-308 + 5 * 10^-16 * 10^-308 = (2.2250738585072014 + 0.0000000000000005) * 10^-308 = 2.2250738585072019 * 10^-308
When performing the addition in Python the result is slightly different. Here's the simple script:
import numpy as np
realmin = np.power(2.0, -1022)
print( "realmin\t\t" + str(realmin) )
smallestSub = np.power(2.0, -1074)
print( "smallest sub\t" + str(smallestSub) )
realminSucc = realmin + smallestSub
print( "sum\t\t" + str(realminSucc) )
The output is:
realmin 2.2250738585072014e-308
smallest sub 5e-324
sum 2.225073858507202e-308
Why does it rounds the sum? There's space for one extra digit as shown by realmin output.
Python is not strict about floating-point behavior, so some of the following is speculative—it depends on the implementation.
Java and JavaScript require the default conversion of floating-point values to strings to use just enough decimal digits to uniquely distinguish the floating-point value. For example, if the representable values in some floating-point format were 3, 3.0625, 3.125, 3.1875, and so on, then converting 3.0625 to a string yields “3.06” because that uniquely distinguishes it from 3 and 3.125, and it must be that long because the shorter “3.1” does not distinguish it from 3.125. But converting 3.125 to a string yields “3.1” because that is enough for it; converting 3.1 to the nearest representable value yields 3.125.
Because Java and JavaScript require this, subroutines for doing those conversions are becoming common, and a Python implementation might use them since they are readily available. This behavior would explain the results you see in your Python implementation.
Although the question states “2^-1074 = 5e-24”, this is not true. 2−1074 is exactly 4.940656458412465441765687928682213723650598026143247644255856825006755072702087518652998363616359923797965646954457177309266567103559397963987747960107818781263007131903114045278458171678489821036887186360569987307230500063874091535649843873124733972731696151400317153853980741262385655911710266585566867681870395603106249319452715914924553293054565444011274801297099995419319894090804165633245247571478690147267801593552386115501348035264934720193790268107107491703332226844753335720832431936092382893458368060106011506169809753078342277318329247904982524730776375927247874656084778203734469699533647017972677717585125660551199131504891101451037862738167250955837389733598993664809941164205702637090279242767544565229087538682506419718265533447265625 • 10−324. The exact values of the floating-point numbers matter in the formatting. Near 2−1022, the representable values are:
Now we can see why 2−1022 must be displayed as “2.2250738585072014e-308”. If it were displayed with one fewer digit, as “2.225073858507201e-308”, that would be closer to 2−1022 − 2−1074 than to 2−1022, so it would be wrong.
However, for 2−1022 + 2−1074, “2.225073858507202e-308” suffices because the closest representable value to that is 2−1022 + 2−1074. 2−1022 + 2•2−1074 is further away.