Sample df:
import polars as pl
import numpy as np
df = pl.DataFrame(
{
"nrs": [1, 2, 3, None, 5],
"names": ["foo", "ham", "spam", "egg", None],
"random": np.random.rand(5),
"A": [True, True, False, False, False],
}
)
I want to replace column random
. So far, I've been doing
new = np.arange(5)
df.replace('random', pl.Series(new))
note that replace
is one of the few polars methods which works inplace!
But now I'm getting
C:\Users\...\AppData\Local\Temp\ipykernel_18244\1406681700.py:2: DeprecationWarning: `replace` is deprecated. DataFrame.replace is deprecated and will be removed in a future version. Please use
df = df.with_columns(new_column.alias(column_name))
instead.
df = df.replace('random', pl.Series(new))
So, should I do
df = df.with_columns(pl.Series(new).alias('random'))
Seems more verbose, also inplace modification is gone. Am I doing things right?
Disclaimer. I think that the polars developers want to nudge the users away from using in-place updates. Also, pl.DataFrame.with_columns
is a cheap operation as it is incredibly optimized and doesn't just copy the underlying data. Hence, using
df = df.with_columns(pl.Series("random", new))
seems like the best approach. See this answer for more information.
Still, if you need in-place updates (e.g. because you implemented a library function, whose interface depends on it), you can use pl.DataFrame.replace_column
.
new_col = pl.Series("random", np.arange(5))
df.replace_column(df.columns.index(new_col.name), new_col)