Search code examples
pythondatetimecastingpython-polarsepoch

Cast multiple columns with Unix epoch to Datetime


I have a dataframe with multiple columns containing unix epochs. In my example I only use 2 of 13 columns I have. I'd like to cast all those columns to a datetime with UTC timezone in a single call to with_columns().

df = pl.from_repr("""
┌─────┬────────────┬────────────┬────────────┐
│ id  ┆ start_date ┆ end_date   ┆ cancelable │
│ --- ┆ ---        ┆ ---        ┆ ---        │
│ i64 ┆ i64        ┆ i64        ┆ bool       │
╞═════╪════════════╪════════════╪════════════╡
│ 1   ┆ 1566637530 ┆ 1566628686 ┆ true       │
│ 2   ┆ 1561372720 ┆ 1561358079 ┆ true       │
│ 3   ┆ 1561374780 ┆ 1561358135 ┆ false      │
│ 4   ┆ 1558714718 ┆ 1556188225 ┆ false      │
│ 5   ┆ 1558715044 ┆ 1558427697 ┆ true       │
└─────┴────────────┴────────────┴────────────┘
""")

Polars provides the user with pl.from_epoch. However, I didn't find a way to apply it to multiple columns as once.

Expected result:

shape: (5, 4)
┌─────┬─────────────────────┬─────────────────────┬────────────┐
│ id  ┆ start_date          ┆ end_date            ┆ cancelable │
│ --- ┆ ---                 ┆ ---                 ┆ ---        │
│ i64 ┆ datetime[μs]        ┆ datetime[μs]        ┆ bool       │
╞═════╪═════════════════════╪═════════════════════╪════════════╡
│ 1   ┆ 2019-08-24 09:05:30 ┆ 2019-08-24 06:38:06 ┆ true       │
│ 2   ┆ 2019-06-24 10:38:40 ┆ 2019-06-24 06:34:39 ┆ true       │
│ 3   ┆ 2019-06-24 11:13:00 ┆ 2019-06-24 06:35:35 ┆ false      │
│ 4   ┆ 2019-05-24 16:18:38 ┆ 2019-04-25 10:30:25 ┆ false      │
│ 5   ┆ 2019-05-24 16:24:04 ┆ 2019-05-21 08:34:57 ┆ true       │
└─────┴─────────────────────┴─────────────────────┴────────────┘

So far, my code looks as follows.

columns_epoch_to_timestamp: list[str] = [
    "start_date",
    "end_date",
]

df = df.with_columns(pl.col(*columns_epoch_to_timestamp))

Solution

  • pl.from_epoch() accepts a single column name, but also Expr objects.

    column: str | Expr | Series | Sequence[int]

    pl.col() can select multiple columns, which we can pass directly:

    df.with_columns(pl.from_epoch(pl.col("start_date", "end_date")))
    
    shape: (5, 4)
    ┌─────┬─────────────────────┬─────────────────────┬────────────┐
    │ id  ┆ start_date          ┆ end_date            ┆ cancelable │
    │ --- ┆ ---                 ┆ ---                 ┆ ---        │
    │ i64 ┆ datetime[μs]        ┆ datetime[μs]        ┆ bool       │
    ╞═════╪═════════════════════╪═════════════════════╪════════════╡
    │ 1   ┆ 2019-08-24 09:05:30 ┆ 2019-08-24 06:38:06 ┆ true       │
    │ 2   ┆ 2019-06-24 10:38:40 ┆ 2019-06-24 06:34:39 ┆ true       │
    │ 3   ┆ 2019-06-24 11:13:00 ┆ 2019-06-24 06:35:35 ┆ false      │
    │ 4   ┆ 2019-05-24 16:18:38 ┆ 2019-04-25 10:30:25 ┆ false      │
    │ 5   ┆ 2019-05-24 16:24:04 ┆ 2019-05-21 08:34:57 ┆ true       │
    └─────┴─────────────────────┴─────────────────────┴────────────┘