Python 3.12.3 Polars 1.8.2 Polars-lts-cpu: 1.10.0 OS: Linux-lite 24.04 VM
I have the following code:
import polars as pl
countries = ['usa', 'france', 'japan', 'brazil', 'new_zealand']
calling_codes = [1, 33, 81, 55, 64]
df = pl.DataFrame({'country': countries, 'calling_code': calling_codes })
capitals_dict = {'usa':'washington_dc', 'france': 'paris', 'brazil': 'brasilia'}
I would like to create a new column called capital
in df
that gets filled from the values in capitals_dict
if the country that is found in df['country']
is in the keys of capitals_dict
.
I have tried using replace
:
df.with_columns(capital = pl.col('country').replace(capitals_dict))
shape: (5, 3)
┌─────────────┬──────────────┬───────────────┐
│ country ┆ calling_code ┆ capital │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str │
╞═════════════╪══════════════╪═══════════════╡
│ usa ┆ 1 ┆ washington_dc │
│ france ┆ 33 ┆ paris │
│ japan ┆ 81 ┆ japan │
│ brazil ┆ 55 ┆ brasilia │
│ new_zealand ┆ 64 ┆ new_zealand │
└─────────────┴──────────────┴───────────────┘
But it will fill the rows for japan
and new_zealand
with the country name. How would I go about assigning a default value for countries not in the capitals_dict
but in the countries
and calling_codes
lists?
So that I get something like this instead:
shape: (5, 3)
┌─────────────┬──────────────┬───────────────┐
│ country ┆ calling_code ┆ capital │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str │
╞═════════════╪══════════════╪═══════════════╡
│ usa ┆ 1 ┆ washington_dc │
│ france ┆ 33 ┆ paris │
│ japan ┆ 81 ┆ [default] │ # <-
│ brazil ┆ 55 ┆ brasilia │
│ new_zealand ┆ 64 ┆ [default] │ # <-
└─────────────┴──────────────┴───────────────┘
Depending on the goal, there are 2 replace functions:
If you want to keep the original value for a non-match, you can use replace
df.with_columns(capital = pl.col("country").replace(capitals_dict))
shape: (5, 3)
┌─────────────┬──────────────┬───────────────┐
│ country ┆ calling_code ┆ capital │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str │
╞═════════════╪══════════════╪═══════════════╡
│ usa ┆ 1 ┆ washington_dc │
│ france ┆ 33 ┆ paris │
│ japan ┆ 81 ┆ japan │ # non-match unchanged
│ brazil ┆ 55 ┆ brasilia │
│ new_zealand ┆ 64 ┆ new_zealand │ # non-match unchanged
└─────────────┴──────────────┴───────────────┘
If you want to replace with a value of a different dtype, or if you want a default value for "non-matches" - you can use replace_strict
df.with_columns(
pl.col("country").replace_strict(capitals_dict, default="NOT FOUND")
.alias("capital")
)
shape: (5, 3)
┌─────────────┬──────────────┬───────────────┐
│ country ┆ calling_code ┆ capital │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str │
╞═════════════╪══════════════╪═══════════════╡
│ usa ┆ 1 ┆ washington_dc │
│ france ┆ 33 ┆ paris │
│ japan ┆ 81 ┆ NOT FOUND │
│ brazil ┆ 55 ┆ brasilia │
│ new_zealand ┆ 64 ┆ NOT FOUND │
└─────────────┴──────────────┴───────────────┘