I am new to polars, and i was trying to convert a flat file datetime type column to a dataframe column with 1 nanosecond precision. For example:
1974-08-08 16:28:51 => 1974-08-08 16:28:51.0
1997-12-12 17:56:19 => 1997-12-12 17:56:19.0
Flat File:
┌─────────────────────┐
│ DATETIME_COL │
╞═════════════════════╡
│ 1974-08-08 16:28:51 │
│ 1997-12-12 17:56:19 │
│ 1986-05-24 09:04:12 │
│ 2008-01-13 05:31:57 │
│ … │
│ 2001-03-24 22:00:13 │
│ 2015-06-16 05:11:43 │
│ 2019-01-15 17:30:22 │
│ 2003-10-25 00:16:40 │
└─────────────────────┘
What I tried?
df = pl.read_csv(...)
df = df.with_columns(
pl.coalesce(
pl.col("DATETIME_COL")
.str.strptime(pl.Datetime, format="%Y-%m-%d %H:%M:%S")
.dt.strftime("%y-%m-%d %H:%M:%S.%3f"),
pl.lit("2000-11-11 00:00:00.0"),
)
)
Output:
┌───────────────────────┐
│ DATETIME_COL │
│ --- │
│ str │
╞═══════════════════════╡
│ 74-08-08 16:28:51.000 │
│ 01-03-24 22:00:13.000 │
│ 19-01-15 17:30:22.000 │
│ 81-12-15 17:43:59.000 │
│ … │
│ 03-10-25 00:16:40.000 │
│ 71-04-23 13:05:49.000 │
│ 86-05-24 09:04:12.000 │
│ 08-01-13 05:31:57.000 │
└───────────────────────┘
Any pointers how to get output with 1 nanosecond precision will be really helpful. Thanks a lot in advance for your time.
Although it doesn't show up on any row of the output, you can use dt.cast_time_unit
to make the column a data type of datetime[ns]
(default is us
), so the nanosecond precision is there on future calculations. For example:
df = df.with_columns(pl.col('DATETIME_COL').dt.cast_time_unit('ns'))
df.with_columns(add=pl.col('DATETIME_COL') + pl.duration(nanoseconds = pl.col('DATETIME_COL').dt.second()))
shape: (8, 2)
┌─────────────────────┬───────────────────────────────┐
│ DATETIME_COL ┆ add │
│ --- ┆ --- │
│ datetime[ns] ┆ datetime[ns] │
╞═════════════════════╪═══════════════════════════════╡
│ 1974-08-08 16:28:51 ┆ 1974-08-08 16:28:51.000000051 │
│ 1997-12-12 17:56:19 ┆ 1997-12-12 17:56:19.000000019 │
│ 1986-05-24 09:04:12 ┆ 1986-05-24 09:04:12.000000012 │
│ 2008-01-13 05:31:57 ┆ 2008-01-13 05:31:57.000000057 │
│ 2001-03-24 22:00:13 ┆ 2001-03-24 22:00:13.000000013 │
│ 2015-06-16 05:11:43 ┆ 2015-06-16 05:11:43.000000043 │
│ 2019-01-15 17:30:22 ┆ 2019-01-15 17:30:22.000000022 │
│ 2003-10-25 00:16:40 ┆ 2003-10-25 00:16:40.000000040 │
└─────────────────────┴───────────────────────────────┘
Another option is directly in the read_csv
call, use the dtypes
keyword argument to specify an exact data type, and pl.Datetime
supports setting the time unit right there and then:
dtypes = {'DATETIME_COL' : pl.Datetime(time_unit = 'ns')}