In Polars we can select columns by their type -- see e.g. How to select columns by data type in Polars?
However, I cannot figure out how to select datetime
columns. From Polars' tutorial we have this example:
import polars as pl
import datetime at dt
df_trades = pl.DataFrame(
{
"time": [
dt.datetime(2020, 1, 1, 9, 1, 0),
dt.datetime(2020, 1, 1, 9, 1, 0),
dt.datetime(2020, 1, 1, 9, 3, 0),
dt.datetime(2020, 1, 1, 9, 6, 0),
],
"stock": ["A", "B", "B", "C"],
"trade": [101, 299, 301, 500],
}
)
The column types are printed like this:
┌─────────────────────┬───────┬───────┐
│ time ┆ stock ┆ trade │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ str ┆ i64 │
For the stock
column we can use
df_trades.select(pl.col(pl.String))
But what is the type for the time
column? Using dt.datetime
instead of pl.String
gives an error:
TypeError: argument 'name': 'type' object cannot be converted to 'PyString'
You can select polars columns by datatypes as you did with pl.String
, but you have to pass a polars
datatype to make that work, not a python datetime.datetime
value/object.
Below we select all datetime datatypes that have "us"
resolution by passing the pl.Datetime("us")
datatype.
import polars as pl
from datetime import datetime
df_trades = pl.DataFrame(
{
"time1": [
datetime(2020, 1, 1, 9, 1, 0),
datetime(2020, 1, 1, 9, 1, 0),
datetime(2020, 1, 1, 9, 3, 0),
datetime(2020, 1, 1, 9, 6, 0),
],
"time2": [
datetime(2020, 1, 1, 9, 1, 0),
datetime(2020, 1, 1, 9, 1, 0),
datetime(2020, 1, 1, 9, 3, 0),
datetime(2020, 1, 1, 9, 6, 0),
],
"stock": ["A", "B", "B", "C"],
"trade": [101, 299, 301, 500],
}
)
df_trades.select(
pl.col(pl.Datetime("us"))
)
shape: (4, 2)
┌─────────────────────┬─────────────────────┐
│ time1 ┆ time2 │
│ --- ┆ --- │
│ datetime[μs] ┆ datetime[μs] │
╞═════════════════════╪═════════════════════╡
│ 2020-01-01 09:01:00 ┆ 2020-01-01 09:01:00 │
│ 2020-01-01 09:01:00 ┆ 2020-01-01 09:01:00 │
│ 2020-01-01 09:03:00 ┆ 2020-01-01 09:03:00 │
│ 2020-01-01 09:06:00 ┆ 2020-01-01 09:06:00 │
└─────────────────────┴─────────────────────┘