This might be linked to Polars DataFrame - Decimal Precision doubles on mul with Integer.
I am facing issues in rounding at various stages in polars (v1.6.0) multiplication. Here is an example:
from decimal import Decimal
import polars as pl
df = pl.DataFrame({
"a": [2, 5],
"b": [Decimal('2.0'), Decimal('5.0')],
"c": [Decimal('2.00'), Decimal('5.00')],
"d": [Decimal('2.000'), Decimal('5.000')],
"e": [Decimal('2.0000'), Decimal('5.0000')]
})
shape: (2, 5)
┌─────┬──────────────┬──────────────┬──────────────┬──────────────┐
│ a ┆ b ┆ c ┆ d ┆ e │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ decimal[*,1] ┆ decimal[*,2] ┆ decimal[*,3] ┆ decimal[*,4] │
╞═════╪══════════════╪══════════════╪══════════════╪══════════════╡
│ 2 ┆ 2.0 ┆ 2.00 ┆ 2.000 ┆ 2.0000 │
│ 5 ┆ 5.0 ┆ 5.00 ┆ 5.000 ┆ 5.0000 │
└─────┴──────────────┴──────────────┴──────────────┴──────────────┘
# Demo 1
# 10/71 = 0.14084507042253522
dfn = df.with_columns(
pl.col(col).mul(10/71)
for col in df.columns
)
shape: (2, 5)
┌──────────┬──────────────┬──────────────┬──────────────┬──────────────┐
│ a ┆ b ┆ c ┆ d ┆ e │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ decimal[*,2] ┆ decimal[*,4] ┆ decimal[*,6] ┆ decimal[*,8] │
╞══════════╪══════════════╪══════════════╪══════════════╪══════════════╡
│ 0.28169 ┆ 0.20 ┆ 0.2800 ┆ 0.280000 ┆ 0.28160000 │
│ 0.704225 ┆ 0.50 ┆ 0.7000 ┆ 0.700000 ┆ 0.70400000 │
└──────────┴──────────────┴──────────────┴──────────────┴──────────────┘
# Demo 2
dfn = df.with_columns(
pl.col(col).mul(Decimal(10/71).quantize((Decimal('1.0000'))))
for col in df.columns
)
shape: (2, 5)
┌──────────────┬──────────────┬──────────────┬──────────────┬──────────────┐
│ a ┆ b ┆ c ┆ d ┆ e │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ decimal[*,8] ┆ decimal[*,5] ┆ decimal[*,6] ┆ decimal[*,7] ┆ decimal[*,8] │
╞══════════════╪══════════════╪══════════════╪══════════════╪══════════════╡
│ 0.28160000 ┆ 0.28160 ┆ 0.281600 ┆ 0.2816000 ┆ 0.28160000 │
│ 0.70400000 ┆ 0.70400 ┆ 0.704000 ┆ 0.7040000 ┆ 0.70400000 │
└──────────────┴──────────────┴──────────────┴──────────────┴──────────────┘
In Demo 1
above, when I just multiply with 10/71, column b [decimal[*,1]
seems to be multiplied by 0.1
instead of 0.140845...
, c decimal[*,2]
seems to be multiplied by 0.14
, d decimal[*,3]
by 0.140
and e decimal[*,4]
by 0.1408
. In short, whatever is the original decimal scale of the column that is multiplied by 10/71, the same scale of 10/71 seems to be used.
As seen in Demo 2
, I must quantize the scale before I use it for multiplication. Note that I cannot just use .mul(Decimal(10/71))
without quantize, as doing so gives an error BindingsError: "Decimal is too large to fit in Decimal128"
.
Is there a better way than forcing the scale of the multiplier?
(Not withstanding the bug (as it seems to me) where it doubles the scale)
Decimal("2.0")*(10/71)
# TypeError: unsupported operand type(s) for *: 'decimal.Decimal' and 'float'
so the python Decimal package won't even try to do Decimal * float so it's hard to tell what "better" is.
As you note, what polars seems to be doing is converting the float to the same scale as the lhs Decimal. I can't think of a better default than that.
Have polars do two integer operations instead of python creating the float
df.with_columns(
pl.all().mul(10).truediv(71)
)
shape: (2, 5)
┌──────────┬──────────────┬──────────────┬───────────────┬────────────────┐
│ a ┆ b ┆ c ┆ d ┆ e │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ decimal[*,6] ┆ decimal[*,8] ┆ decimal[*,10] ┆ decimal[*,12] │
╞══════════╪══════════════╪══════════════╪═══════════════╪════════════════╡
│ 0.28169 ┆ 0.281690 ┆ 0.28169014 ┆ 0.2816901408 ┆ 0.281690140845 │
│ 0.704225 ┆ 0.704225 ┆ 0.70422535 ┆ 0.7042253521 ┆ 0.704225352112 │
└──────────┴──────────────┴──────────────┴───────────────┴────────────────┘
In this case the mul doubles the scale (which I think is a bug) and then the division seems to add 4 to the scale which I can't figure out (maybe that's a bug too or just an intended hard coded default).
Use the Decimal form of 10/71 specified to the scale you want such as
df.with_columns(
pl.all().mul(Decimal("0.1408"))
)
shape: (2, 5)
┌──────────────┬──────────────┬──────────────┬──────────────┬──────────────┐
│ a ┆ b ┆ c ┆ d ┆ e │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ decimal[*,8] ┆ decimal[*,5] ┆ decimal[*,6] ┆ decimal[*,7] ┆ decimal[*,8] │
╞══════════════╪══════════════╪══════════════╪══════════════╪══════════════╡
│ 0.28160000 ┆ 0.28160 ┆ 0.281600 ┆ 0.2816000 ┆ 0.28160000 │
│ 0.70400000 ┆ 0.70400 ┆ 0.704000 ┆ 0.7040000 ┆ 0.70400000 │
└──────────────┴──────────────┴──────────────┴──────────────┴──────────────┘
In this case (ignoring column a) it looks like it's adding the scales. That is the same behavior as Decimal("2.00")*Decimal("0.1408")
so I think that's right.