I have the following code and output.
Code.
import polars as pl
df = pl.DataFrame({
'type': ['A', 'O', 'B', 'O'],
'id': ['CASH', 'ORB.A123', 'CHECK', 'OTC.BV32']
})
df.with_columns(sub_id=pl.when(pl.col('type') == 'O').then(pl.col('id').str.split('.')).otherwise(None))
Output.
shape: (4, 3)
┌──────┬──────────┬─────────────────┐
│ type ┆ id ┆ sub_id │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ list[str] │
╞══════╪══════════╪═════════════════╡
│ A ┆ CASH ┆ null │
│ O ┆ ORB.A123 ┆ ["ORB", "A123"] │
│ B ┆ CHECK ┆ null │
│ O ┆ OTC.BV32 ┆ ["OTC", "BV32"] │
└──────┴──────────┴─────────────────┘
Now, how would I extract the n-th element (or in this case, the last element) of each list?
Especially, the expected output is the following.
shape: (4, 3)
┌──────┬──────────┬────────────┐
│ type ┆ id ┆ sub_id │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞══════╪══════════╪════════════╡
│ A ┆ CASH ┆ null │
│ O ┆ ORB.A123 ┆ "A123" │
│ B ┆ CHECK ┆ null │
│ O ┆ OTC.BV32 ┆ "BV32" │
└──────┴──────────┴────────────┘
You can simply append .list.last()
to select the last element of each list.
Alternatively, there exists .list.get()
to get list elements by index.
import polars as pl
df = pl.DataFrame({
'type': ['A', 'O', 'B', 'O'],
'id': ['CASH', 'ORB.A123', 'CHECK', 'OTC.BV32']
})
df.with_columns(
sub_id=pl.when(
pl.col('type') == 'O'
).then(
pl.col('id').str.split('.').list.last()
)
)
shape: (4, 3)
┌──────┬──────────┬────────┐
│ type ┆ id ┆ sub_id │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞══════╪══════════╪════════╡
│ A ┆ CASH ┆ null │
│ O ┆ ORB.A123 ┆ A123 │
│ B ┆ CHECK ┆ null │
│ O ┆ OTC.BV32 ┆ BV32 │
└──────┴──────────┴────────┘
Note that I've dropped .otherwise(None)
as this is the default behaviour of an if-then-otherwise expression.