Search code examples
pythonpython-polars

Polars: how to add a column with numerical?


In pandas, we can just assign directly:

import pandas as pd

df = pd.DataFrame({"a": [1, 2]})

# add a single value
df["b"] = 3

# add an existing Series
df["c"] = pd.Series([4, 5])
   a  b  c
0  1  3  4
1  2  3  5

Notice that the new numerical Series is not in the original df, it is a result of some computation.

How do we do the same thing in polars?

import polars as pl

df = pl.DataFrame({"a": [1, 2]})

df = df.with_columns(...) # ????

Solution

  • Let's start with this DataFrame:

    import polars as pl
    
    df = pl.DataFrame(
        {
            "col1": [1, 2, 3, 4, 5],
        }
    )
    
    shape: (5, 1)
    ┌──────┐
    │ col1 │
    │ ---  │
    │ i64  │
    ╞══════╡
    │ 1    │
    │ 2    │
    │ 3    │
    │ 4    │
    │ 5    │
    └──────┘
    

    To add a scalar (single value)

    Use polars.lit.

    my_scalar = -1
    df.with_columns(pl.lit(my_scalar).alias("col_scalar"))
    
    shape: (5, 2)
    ┌──────┬────────────┐
    │ col1 ┆ col_scalar │
    │ ---  ┆ ---        │
    │ i64  ┆ i32        │
    ╞══════╪════════════╡
    │ 1    ┆ -1         │
    │ 2    ┆ -1         │
    │ 3    ┆ -1         │
    │ 4    ┆ -1         │
    │ 5    ┆ -1         │
    └──────┴────────────┘
    

    You can also choose the datatype of the new column using the dtype keyword.

    df.with_columns(pl.lit(my_scalar, dtype=pl.Float64).alias("col_scalar_float"))
    
    shape: (5, 2)
    ┌──────┬──────────────────┐
    │ col1 ┆ col_scalar_float │
    │ ---  ┆ ---              │
    │ i64  ┆ f64              │
    ╞══════╪══════════════════╡
    │ 1    ┆ -1.0             │
    │ 2    ┆ -1.0             │
    │ 3    ┆ -1.0             │
    │ 4    ┆ -1.0             │
    │ 5    ┆ -1.0             │
    └──────┴──────────────────┘
    

    To add a list

    To add a list of values (perhaps from some external computation), use the polars.Series constructor and provide a name to the Series constructor.

    my_list = [10, 20, 30, 40, 50]
    df.with_columns(pl.Series(name="col_list", values=my_list))
    
    shape: (5, 2)
    ┌──────┬──────────┐
    │ col1 ┆ col_list │
    │ ---  ┆ ---      │
    │ i64  ┆ i64      │
    ╞══════╪══════════╡
    │ 1    ┆ 10       │
    │ 2    ┆ 20       │
    │ 3    ┆ 30       │
    │ 4    ┆ 40       │
    │ 5    ┆ 50       │
    └──────┴──────────┘
    

    You can use the dtype keyword to control the datatype of the new series, if needed.

    df.with_columns(pl.Series(name="col_list", values=my_list, dtype=pl.Float64))
    
    shape: (5, 2)
    ┌──────┬──────────┐
    │ col1 ┆ col_list │
    │ ---  ┆ ---      │
    │ i64  ┆ f64      │
    ╞══════╪══════════╡
    │ 1    ┆ 10.0     │
    │ 2    ┆ 20.0     │
    │ 3    ┆ 30.0     │
    │ 4    ┆ 40.0     │
    │ 5    ┆ 50.0     │
    └──────┴──────────┘
    

    To add a Series

    If you already have a Series, you can just provide a reference to it.

    my_series = pl.Series(name="my_series_name", values=[10, 20, 30, 40, 50])
    df.with_columns(my_series)
    
    shape: (5, 2)
    ┌──────┬────────────────┐
    │ col1 ┆ my_series_name │
    │ ---  ┆ ---            │
    │ i64  ┆ i64            │
    ╞══════╪════════════════╡
    │ 1    ┆ 10             │
    │ 2    ┆ 20             │
    │ 3    ┆ 30             │
    │ 4    ┆ 40             │
    │ 5    ┆ 50             │
    └──────┴────────────────┘
    

    If your Series does not already have a name, you can provide one using the alias Expression.

    my_series_no_name = pl.Series(values=[10, 20, 30, 40, 50])
    df.with_columns(my_series_no_name.alias('col_no_name'))
    
    shape: (5, 2)
    ┌──────┬─────────────┐
    │ col1 ┆ col_no_name │
    │ ---  ┆ ---         │
    │ i64  ┆ i64         │
    ╞══════╪═════════════╡
    │ 1    ┆ 10          │
    │ 2    ┆ 20          │
    │ 3    ┆ 30          │
    │ 4    ┆ 40          │
    │ 5    ┆ 50          │
    └──────┴─────────────┘