I have a Polars dataframe pydf
in Python with a datetime column time1
that I'm passing to Rust.
In Rust I am calling code that works with epoch format, but afterwards it should return another dataframe with a datetime column time2_epoch
in the same time unit and time zone as time1
.
Currently I perform the conversion to and from epoch on the Python side.
from datetime import datetime
from zoneinfo import ZoneInfo
import polars as pl
pydf = pl.DataFrame({'time1': datetime(2024, 1, 1, tzinfo=ZoneInfo("UTC"))})
time_zone = pydf.schema['time1'].time_zone
time_unit = pydf.schema['time1'].time_unit
pydf_epoch = pydf.with_columns(pl.col('time1').dt.epoch(time_unit))
On the Rust side I have this function accepting the Polars dataframe and the time unit
fn foo(pydf: PyDataFrame, time_unit: &str) -> PyResult<PyDataFrame> {
// Work on pydf and return a dataframe with column time2_epoch whose precision is time_unit
}
Back on the Python side I convert time2_epoch
to a datetime column
rustdf.with_columns(
pl.from_epoch(pl.col('time2_epoch'), time_unit).alias('time2')
).with_columns(
pl.col('time2').dt.convert_time_zone(time_zone)
)
Is it possible to pass pydf
directly to foo
and perform the time conversions in Rust?
I can get the type of the time1
column (an enum AFAIK), but I don't know how to extract the time zone and time unit from this
let schema = pydf.schema();
let time1_type = schema.get("time1").unwrap();
Here's a way to do it. It just uses DataFrame but it should just be a matter of doing .df
from the PyDataFrame and then PyDataFrame::new()
for the output. This is copying and pasting chunks of other things I have so it looks more awkward than it probably needs to look if you clean it up a bit.
fn foo(df: DataFrame) -> DataFrame {
let time1 = df.column("time1").unwrap().as_materialized_series();
let dtype=time1.dtype();
let (seconds_denominator, ns_mod, ns_mult,tz) = match dtype {
DataType::Datetime(time_unit,tz) => {
match time_unit {
TimeUnit::Milliseconds => (1_000, 1_000, 1_000_000, tz),
TimeUnit::Microseconds => (1_000_000, 1_000_000, 1_000, tz),
TimeUnit::Nanoseconds => (1_000_000_000, 1_000_000_000, 1, tz),
}
},
_=>panic!("not datetime")
};
let time1 = time1.datetime().unwrap();
let out_time:Vec<AnyValue> = time1.into_iter().map(|from_epoch| {
match (from_epoch) {
Some(from_epoch)=>{
let seconds= from_epoch/seconds_denominator;
let nanoseconds = from_epoch%ns_mod * ns_mult;
let raw_utc = DateTime::from_timestamp(seconds, nanoseconds as u32).unwrap();
match tz {
Some(tz_str)=> {
let timezone: chrono_tz::Tz = tz_str.parse().unwrap();
let tz_aware = raw_utc.with_timezone(&timezone);
// Do your datetime operations on tz_aware here
AnyValue::Datetime(tz_aware.timestamp_micros(), TimeUnit::Microseconds, Some(tz_str))
},
None=> {
// Do your datetime operations on raw_utc here and let it return up to out_time
AnyValue::Datetime(raw_utc.timestamp_micros(), TimeUnit::Microseconds, None)
}
}
},
None=>AnyValue::Null
}
}).collect();
let out_column = Series::from_any_values_and_dtype("time2".into(), &out_time,dtype, true).unwrap().into_column();
let mut df2 = df.clone();
let df2=df2.with_column(out_column).unwrap();
let df3=df2.clone();
df3
}