In pandas, we can just assign directly:
import pandas as pd
df = pd.DataFrame({"a": [1, 2]})
# add a single value
df["b"] = 3
# add an existing Series
df["c"] = pd.Series([4, 5])
a b c
0 1 3 4
1 2 3 5
Notice that the new numerical Series is not in the original df
, it is a result of some computation.
How do we do the same thing in polars?
import polars as pl
df = pl.DataFrame({"a": [1, 2]})
df = df.with_columns(...) # ????
Let's start with this DataFrame:
import polars as pl
df = pl.DataFrame(
{
"col1": [1, 2, 3, 4, 5],
}
)
shape: (5, 1)
┌──────┐
│ col1 │
│ --- │
│ i64 │
╞══════╡
│ 1 │
│ 2 │
│ 3 │
│ 4 │
│ 5 │
└──────┘
Use polars.lit
.
my_scalar = -1
df.with_columns(pl.lit(my_scalar).alias("col_scalar"))
shape: (5, 2)
┌──────┬────────────┐
│ col1 ┆ col_scalar │
│ --- ┆ --- │
│ i64 ┆ i32 │
╞══════╪════════════╡
│ 1 ┆ -1 │
│ 2 ┆ -1 │
│ 3 ┆ -1 │
│ 4 ┆ -1 │
│ 5 ┆ -1 │
└──────┴────────────┘
You can also choose the datatype of the new column using the dtype
keyword.
df.with_columns(pl.lit(my_scalar, dtype=pl.Float64).alias("col_scalar_float"))
shape: (5, 2)
┌──────┬──────────────────┐
│ col1 ┆ col_scalar_float │
│ --- ┆ --- │
│ i64 ┆ f64 │
╞══════╪══════════════════╡
│ 1 ┆ -1.0 │
│ 2 ┆ -1.0 │
│ 3 ┆ -1.0 │
│ 4 ┆ -1.0 │
│ 5 ┆ -1.0 │
└──────┴──────────────────┘
To add a list of values (perhaps from some external computation), use the polars.Series constructor and provide a name to the Series constructor.
my_list = [10, 20, 30, 40, 50]
df.with_columns(pl.Series(name="col_list", values=my_list))
shape: (5, 2)
┌──────┬──────────┐
│ col1 ┆ col_list │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞══════╪══════════╡
│ 1 ┆ 10 │
│ 2 ┆ 20 │
│ 3 ┆ 30 │
│ 4 ┆ 40 │
│ 5 ┆ 50 │
└──────┴──────────┘
You can use the dtype
keyword to control the datatype of the new series, if needed.
df.with_columns(pl.Series(name="col_list", values=my_list, dtype=pl.Float64))
shape: (5, 2)
┌──────┬──────────┐
│ col1 ┆ col_list │
│ --- ┆ --- │
│ i64 ┆ f64 │
╞══════╪══════════╡
│ 1 ┆ 10.0 │
│ 2 ┆ 20.0 │
│ 3 ┆ 30.0 │
│ 4 ┆ 40.0 │
│ 5 ┆ 50.0 │
└──────┴──────────┘
If you already have a Series, you can just provide a reference to it.
my_series = pl.Series(name="my_series_name", values=[10, 20, 30, 40, 50])
df.with_columns(my_series)
shape: (5, 2)
┌──────┬────────────────┐
│ col1 ┆ my_series_name │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞══════╪════════════════╡
│ 1 ┆ 10 │
│ 2 ┆ 20 │
│ 3 ┆ 30 │
│ 4 ┆ 40 │
│ 5 ┆ 50 │
└──────┴────────────────┘
If your Series does not already have a name, you can provide one using the alias
Expression.
my_series_no_name = pl.Series(values=[10, 20, 30, 40, 50])
df.with_columns(my_series_no_name.alias('col_no_name'))
shape: (5, 2)
┌──────┬─────────────┐
│ col1 ┆ col_no_name │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞══════╪═════════════╡
│ 1 ┆ 10 │
│ 2 ┆ 20 │
│ 3 ┆ 30 │
│ 4 ┆ 40 │
│ 5 ┆ 50 │
└──────┴─────────────┘