The task I am benchmarking with is simply clipping values element-wise. I have done this using both numpy
and polars
. But, it turned out that using numpy
is much faster (~5x) than using polars
(as shown below).
So, my question:
polars
(although it is highly optimized for join/groupby), may not be well-suited for performing relatively simple numerical vector/array operations such as clipping in my example?import timeit
import numpy as np
import polars as pl
N = 10_000_000
x = np.random.normal(size=N)
y = np.random.normal(size=N)
z = y + 0.5
df = pl.DataFrame({"x": x, "y": y, "z": z})
>>> timeit.timeit(lambda: np.minimum(np.maximum(x, y), z), number=10)
0.60923
>>> timeit.timeit(lambda: df.select(pl.min_horizontal(pl.max_horizontal(pl.col("x"), pl.col("y")), pl.col("z"))), number=10)
3.39337
As of polars >= 0.17.8
we optimized it a bit more. It is much closer now.
Our horizontal min
operation is not really optimized. If you open an issue we can improve that. Most of our optimization attention has gone to the expensive operations.
To answer your question, no it depends.
In aggregate, polars might be faster; as often, many operations are able to run in parallel. If you feel it is too slow in one area, open an issue.