Search code examples
dataframereplacepython-polars

replace a Polars column with a 1D array


Sample df:

import polars as pl
import numpy as np
df = pl.DataFrame(
    {
        "nrs": [1, 2, 3, None, 5],
        "names": ["foo", "ham", "spam", "egg", None],
        "random": np.random.rand(5),
        "A": [True, True, False, False, False],
    }
)

I want to replace column random. So far, I've been doing

new = np.arange(5)
df.replace('random', pl.Series(new))

note that replace is one of the few polars methods which works inplace!

But now I'm getting

C:\Users\...\AppData\Local\Temp\ipykernel_18244\1406681700.py:2: DeprecationWarning: `replace` is deprecated. DataFrame.replace is deprecated and will be removed in a future version. Please use
    df = df.with_columns(new_column.alias(column_name))
instead.
  df = df.replace('random', pl.Series(new)) 

So, should I do

df = df.with_columns(pl.Series(new).alias('random'))

Seems more verbose, also inplace modification is gone. Am I doing things right?


Solution

  • Disclaimer. I think that the polars developers want to nudge the users away from using in-place updates. Also, pl.DataFrame.with_columns is a cheap operation as it is incredibly optimized and doesn't just copy the underlying data. Hence, using

    df = df.with_columns(pl.Series("random", new))
    

    seems like the best approach. See this answer for more information.


    Still, if you need in-place updates (e.g. because you implemented a library function, whose interface depends on it), you can use pl.DataFrame.replace_column.

    new_col = pl.Series("random", np.arange(5))
    df.replace_column(df.columns.index(new_col.name), new_col)