I'm attempting to determine the time based on the timezone specified in each row using Polars
. Consider the following code snippet:
import polars as pl
from datetime import datetime
from polars import col as c
df = pl.DataFrame({
"time": [datetime(2023, 4, 3, 2), datetime(2023, 4, 4, 3), datetime(2023, 4, 5, 4)],
"tzone": ["Asia/Tokyo", "America/Chicago", "Europe/Paris"]
}).with_columns(c.time.dt.replace_time_zone("UTC"))
df.with_columns(
tokyo=c.time.dt.convert_time_zone("Asia/Tokyo").dt.hour(),
chicago=c.time.dt.convert_time_zone("America/Chicago").dt.hour(),
paris=c.time.dt.convert_time_zone("Europe/Paris").dt.hour()
)
In this example, I've computed the time separately for each timezone to achieve the desired outcome, which is [11, 22, 6], corresponding to the hour of the time
column according to the tzone
timezone. Even then it is difficult to collect the information from the correct column.
Unfortunately, the following simple attempt to dynamically pass the timezone from the tzone
column directly into the convert_time_zone
function does not work:
df.with_columns(c.time.dt.convert_time_zone(c.tzone).dt.hour())
# TypeError: argument 'time_zone': 'Expr' object cannot be converted to 'PyString'
What would be the most elegant approach to accomplish this task?
The only way to do this which fully works with lazy execution is to use the polars-xdt
plugin:
df = pl.DataFrame(
{
"time": [
datetime(2023, 4, 3, 2),
datetime(2023, 4, 4, 3),
datetime(2023, 4, 5, 4),
],
"tzone": ["Asia/Tokyo", "America/Chicago", "Europe/Paris"],
}
).with_columns(pl.col("time").dt.replace_time_zone("UTC"))
df.with_columns(
result=xdt.to_local_datetime("time", pl.col("tzone")).dt.hour(),
)
Result:
Out[6]:
shape: (3, 3)
┌─────────────────────────┬─────────────────┬────────┐
│ time ┆ tzone ┆ result │
│ --- ┆ --- ┆ --- │
│ datetime[μs, UTC] ┆ str ┆ i8 │
╞═════════════════════════╪═════════════════╪════════╡
│ 2023-04-03 02:00:00 UTC ┆ Asia/Tokyo ┆ 11 │
│ 2023-04-04 03:00:00 UTC ┆ America/Chicago ┆ 22 │
│ 2023-04-05 04:00:00 UTC ┆ Europe/Paris ┆ 6 │
└─────────────────────────┴─────────────────┴────────┘
https://github.com/pola-rs/polars-xdt
If you don't need lazy execution, then as other answers have suggested, you can iterate over the unique values of your 'time_zone'
column