Search code examples
python-polars

Select datetime columns in Polars


In Polars we can select columns by their type -- see e.g. How to select columns by data type in Polars?

However, I cannot figure out how to select datetime columns. From Polars' tutorial we have this example:

import polars as pl
import datetime at dt

df_trades = pl.DataFrame(
    {
        "time": [
            dt.datetime(2020, 1, 1, 9, 1, 0),
            dt.datetime(2020, 1, 1, 9, 1, 0),
            dt.datetime(2020, 1, 1, 9, 3, 0),
            dt.datetime(2020, 1, 1, 9, 6, 0),
        ],
        "stock": ["A", "B", "B", "C"],
        "trade": [101, 299, 301, 500],
    }
)

The column types are printed like this:

┌─────────────────────┬───────┬───────┐
│ time                ┆ stock ┆ trade │
│ ---                 ┆ ---   ┆ ---   │
│ datetime[μs]        ┆ str   ┆ i64   │

For the stock column we can use

df_trades.select(pl.col(pl.Utf8))

But what is the type for the time column? Using dt.datetime instead of pl.Utf8 gives an error:

TypeError: argument 'name': 'type' object cannot be converted to 'PyString'


Solution

  • You can select polars columns by datatypes as you did with pl.Utf8, but you have to pass a polars datatype to make that work, not a python datetime.datetime value/object.

    Below we select all datetime datatypes that have "us" resolution by passing the pl.Datetime("us") datatype.

    import polars as pl
    from datetime import datetime
    
    df_trades = pl.DataFrame(
        {
            "time1": [
                datetime(2020, 1, 1, 9, 1, 0),
                datetime(2020, 1, 1, 9, 1, 0),
                datetime(2020, 1, 1, 9, 3, 0),
                datetime(2020, 1, 1, 9, 6, 0),
            ],
            "time2": [
                datetime(2020, 1, 1, 9, 1, 0),
                datetime(2020, 1, 1, 9, 1, 0),
                datetime(2020, 1, 1, 9, 3, 0),
                datetime(2020, 1, 1, 9, 6, 0),
            ],
            "stock": ["A", "B", "B", "C"],
            "trade": [101, 299, 301, 500],
        }
    )
    
    df_trades.select(
        pl.col(pl.Datetime("us"))
    )
    
    shape: (4, 2)
    ┌─────────────────────┬─────────────────────┐
    │ time1               ┆ time2               │
    │ ---                 ┆ ---                 │
    │ datetime[μs]        ┆ datetime[μs]        │
    ╞═════════════════════╪═════════════════════╡
    │ 2020-01-01 09:01:00 ┆ 2020-01-01 09:01:00 │
    ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
    │ 2020-01-01 09:01:00 ┆ 2020-01-01 09:01:00 │
    ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
    │ 2020-01-01 09:03:00 ┆ 2020-01-01 09:03:00 │
    ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
    │ 2020-01-01 09:06:00 ┆ 2020-01-01 09:06:00 │
    └─────────────────────┴─────────────────────┘