Search code examples
pythonnumpyfloating-pointprecisionieee-754

Why does Python round this sum?


I'm working with 64 bits floating point arithmetic as defined by IEEE 754. The smallest subnormal number is:
2^-1074 = 5e-324 = 5 * 10^-16 * 10^-308

Adding the latter to realmin results in:
2^-1022 + 2^-1074 = 2.2250738585072014 * 10^-308 + 5 * 10^-16 * 10^-308 = (2.2250738585072014 + 0.0000000000000005) * 10^-308 = 2.2250738585072019 * 10^-308

When performing the addition in Python the result is slightly different. Here's the simple script:

import numpy as np

realmin = np.power(2.0, -1022)
print( "realmin\t\t" + str(realmin) )

smallestSub = np.power(2.0, -1074)
print( "smallest sub\t" + str(smallestSub) )

realminSucc = realmin + smallestSub
print( "sum\t\t" + str(realminSucc) )

The output is:

realmin         2.2250738585072014e-308
smallest sub    5e-324
sum             2.225073858507202e-308

Why does it rounds the sum? There's space for one extra digit as shown by realmin output.


Solution

  • Python is not strict about floating-point behavior, so some of the following is speculative—it depends on the implementation.

    Java and JavaScript require the default conversion of floating-point values to strings to use just enough decimal digits to uniquely distinguish the floating-point value. For example, if the representable values in some floating-point format were 3, 3.0625, 3.125, 3.1875, and so on, then converting 3.0625 to a string yields “3.06” because that uniquely distinguishes it from 3 and 3.125, and it must be that long because the shorter “3.1” does not distinguish it from 3.125. But converting 3.125 to a string yields “3.1” because that is enough for it; converting 3.1 to the nearest representable value yields 3.125.

    Because Java and JavaScript require this, subroutines for doing those conversions are becoming common, and a Python implementation might use them since they are readily available. This behavior would explain the results you see in your Python implementation.

    Although the question states “2^-1074 = 5e-24”, this is not true. 2−1074 is exactly 4.940656458412465441765687928682213723650598026143247644255856825006755072702087518652998363616359923797965646954457177309266567103559397963987747960107818781263007131903114045278458171678489821036887186360569987307230500063874091535649843873124733972731696151400317153853980741262385655911710266585566867681870395603106249319452715914924553293054565444011274801297099995419319894090804165633245247571478690147267801593552386115501348035264934720193790268107107491703332226844753335720832431936092382893458368060106011506169809753078342277318329247904982524730776375927247874656084778203734469699533647017972677717585125660551199131504891101451037862738167250955837389733598993664809941164205702637090279242767544565229087538682506419718265533447265625 • 10−324. The exact values of the floating-point numbers matter in the formatting. Near 2−1022, the representable values are:

    • 2−1022 − 2−1074 = 2.2250738585072008890245868760858598876504231122409594654935248025624400092282356951787758888037591552642309780950434312085877387158357291821993020294379224223559819827501242041788969571311791082261043971979604000454897391938079198936081525613113376149842043271751033627391549782731594143828136275113838604094249464942286316695429105080201815926642134996606517803095075913058719846423906068637102005108723282784678843631944515866135041223479014792369585208321597621066375401613736583044193603714778355306682834535634005074073040135602968046375918583163124224521599262546494300836851861719422417646455137135420132217031370496583210154654068035397417906022589503023501937519773030945763173210852507299305089761582519159720757232455434770912461317493580281734466552734375 • 10−308.
    • 2−1022 = 2.225073858507201383090232717332404064219215980462331830553327416887204434813918195854283159012511020564067339731035811005152434161553460108856012385377718821130777993532002330479610147442583636071921565046942503734208375250806650616658158948720491179968591639648500635908770118304874799780887753749949451580451605050915399856582470818645113537935804992115981085766051992433352114352390148795699609591288891602992641511063466313393663477586513029371762047325631781485664350872122828637642044846811407613911477062801689853244110024161447421618567166150540154285084716752901903161322778896729707373123334086988983175067838846926092773977972858659654941091369095406136467568702398678315290680984617210924625396728515625 • 10−308.
    • 2−1022 + 2−1074 = 2.2250738585072018771558785585789482407880088486837041956131300312119688603996006965297904292212628858639037013670281908017171296072711910355127227413175152199055740043138804567803233377539881639177387328959246074229270113078053813397081653361296447449529789521218979090783852583365901851789618799885150427514782636076021680436220311292700454832073964845713103912225963935608322440623896907276890186717054549275173986589324810401738228328251245795065655738191038008646911615828719989708647293221449796971546706720399791990809160347625980385995424739847678861180095072511543762389603716215171729816011544604359531284325406441938645324905389137795680915804792405099227413854274942620542640408839836919187418172987793340279242767544565229087538682506419718265533447265625 • 10−308.
    • 2−1022 + 2•2−1074 = 2.225073858507202371221524399825492417356801716905076560672932645536733285985283197205297699430014751163740063003020570598281825052988921962169433097257311618680370015095758583081036528065392691763555900744906711111645647364804112062758171723538798309937366264595295182248000398368305570577036006227080633922504922164288936230661591439894977428478987977026639696679140794688312373772389232659678427752122018252042155806801495766953982188063736129641369100312575820243717972293621169304087413797478551780397864281278268544917722045363748655580517781818995617950934297749406849316597964346304638590078974833882923081797242441461636291003104968899481242069589385613709015202152589845793237400783350172912858237869043043055848553508913045817507736501283943653106689453125 • 10−308.

    Now we can see why 2−1022 must be displayed as “2.2250738585072014e-308”. If it were displayed with one fewer digit, as “2.225073858507201e-308”, that would be closer to 2−1022 − 2−1074 than to 2−1022, so it would be wrong.

    However, for 2−1022 + 2−1074, “2.225073858507202e-308” suffices because the closest representable value to that is 2−1022 + 2−1074. 2−1022 + 2•2−1074 is further away.