Search code examples
pythondatetimepython-polars

Cast pl.Date to Unix epoch


Trying to convert a pl.Date column to UNIX epoch as is, without any timezone offset:

import datetime
import polars as pl

df = pl.DataFrame(
    {'Date': [datetime.datetime.now().date()]}
)

Correct time (00:00:00) when converted to Datetime:

df.with_columns(
    pl.col("Date").cast(pl.Datetime)
)
┌─────────────────────┐
│ Date                │
│ ---                 │
│ datetime[μs]        │
╞═════════════════════╡
│ 2023-06-10 00:00:00 │
└─────────────────────┘

Incorrect time when casting to timestamp:

datetime.datetime.fromtimestamp(
    df.with_columns(
        pl.col("Date").cast(pl.Datetime).dt.timestamp("ms").truediv(1_000)
    ).item()
)
datetime.datetime(2023, 6, 10, 8, 0) # (08:00:00)

As suggested, without casting to Datetime also produces the incorrect time. (08:00:00)

pl.col("Date").dt.timestamp("ms").truediv(1_000)

Solution

  • Note that vanilla Python datetime defaults to local time if you don't set a time zone (naive datetime). In contrast, polars assumes naive datetime to resemble UTC (as pandas does as well).

    Keep it consistent by setting the time zone, e.g. UTC:

    from datetime import datetime, timezone
    import polars as pl
    
    df = pl.DataFrame(
        {'Date': [datetime.now(timezone.utc).date()]}
    )
    
    df = df.with_columns(
        pl.col("Date").cast(pl.Datetime).dt.timestamp("ms").truediv(1_000).alias("Unix")
    )
    
    print(df)
    # shape: (1, 2)
    # ┌────────────┬──────────┐
    # │ Date       ┆ Unix     │
    # │ ---        ┆ ---      │
    # │ date       ┆ f64      │
    # ╞════════════╪══════════╡
    # │ 2023-06-10 ┆ 1.6864e9 │
    # └────────────┴──────────┘
    
    print(datetime.fromtimestamp(df["Unix"][0], timezone.utc))
    # 2023-06-10 00:00:00+00:00