Search code examples
pythonpython-polars

Easily convert string column to pl.datetime in Polars


Consider a Polars data frame with a column of str type that indicates the date in the format '27 July 2020'.

I would like to convert this column to the polars.datetime type, which is distinct from the Python standard datetime.

import polars as pl
from datetime import datetime

df = pl.DataFrame({
    "id": [1, 2], 
     "event_date": ["27 July 2020", "31 December 2020"]
})

df = df.with_columns( 
    pl.col("event_date").map_elements(lambda x: x.replace(" ", "-"))
                        .map_elements(lambda x: datetime.strptime(x, "%d-%B-%Y"))
)
shape: (2, 2)
┌─────┬─────────────────────┐
│ id  ┆ event_date          │
│ --- ┆ ---                 │
│ i64 ┆ datetime[μs]        │
╞═════╪═════════════════════╡
│ 1   ┆ 2020-07-27 00:00:00 │
│ 2   ┆ 2020-12-31 00:00:00 │
└─────┴─────────────────────┘

Suppose we try to process df further to create a new column indicating the quarter of the year an event took place.

df.with_columns(
    pl.col("event_date").map_elements(lambda x: x.month)
                        .map_elements(lambda x: 1 if x in range(1,4) else 2 if x in range(4,7) else 3 if x in range(7,10) else 4)
                        .alias("quarter")
)
shape: (2, 3)
┌─────┬─────────────────────┬─────────┐
│ id  ┆ event_date          ┆ quarter │
│ --- ┆ ---                 ┆ ---     │
│ i64 ┆ datetime[μs]        ┆ i64     │
╞═════╪═════════════════════╪═════════╡
│ 1   ┆ 2020-07-27 00:00:00 ┆ 3       │
│ 2   ┆ 2020-12-31 00:00:00 ┆ 4       │
└─────┴─────────────────────┴─────────┘

How would I do this in Polars without applying custom lambdas through map_elements?


Solution

  • The easiest way to convert strings to Date/Datetime is to use Polars' own functions:

    df.with_columns( 
        pl.col("event_date").str.to_datetime("%d %B %Y")
    )
    
    shape: (2, 2)
    ┌─────┬─────────────────────┐
    │ id  ┆ event_date          │
    │ --- ┆ ---                 │
    │ i64 ┆ datetime[μs]        │
    ╞═════╪═════════════════════╡
    │ 1   ┆ 2020-07-27 00:00:00 │
    │ 2   ┆ 2020-12-31 00:00:00 │
    └─────┴─────────────────────┘
    

    The Temporal section of the docs shows the supported functions in the .dt namespace.

    In the case of your second example, there is a dedicated quarter expression:

    df = df.with_columns( 
        pl.col("event_date").str.to_datetime("%d %B %Y")
    ).with_columns(
        pl.col("event_date").dt.quarter().alias("quarter")
    )
    
    shape: (2, 3)
    ┌─────┬─────────────────────┬─────────┐
    │ id  ┆ event_date          ┆ quarter │
    │ --- ┆ ---                 ┆ ---     │
    │ i64 ┆ datetime[μs]        ┆ i8      │
    ╞═════╪═════════════════════╪═════════╡
    │ 1   ┆ 2020-07-27 00:00:00 ┆ 3       │
    │ 2   ┆ 2020-12-31 00:00:00 ┆ 4       │
    └─────┴─────────────────────┴─────────┘