Search code examples
python-polarsrust-polars

Get Polars datetime unit and time zone in Rust


I have a Polars dataframe pydf in Python with a datetime column time1 that I'm passing to Rust. In Rust I am calling code that works with epoch format, but afterwards it should return another dataframe with a datetime column time2_epoch in the same time unit and time zone as time1.

Currently I perform the conversion to and from epoch on the Python side.

from datetime import datetime
from zoneinfo import ZoneInfo
import polars as pl

pydf = pl.DataFrame({'time1': datetime(2024, 1, 1, tzinfo=ZoneInfo("UTC"))})

time_zone = pydf.schema['time1'].time_zone
time_unit = pydf.schema['time1'].time_unit

pydf_epoch = pydf.with_columns(pl.col('time1').dt.epoch(time_unit))

On the Rust side I have this function accepting the Polars dataframe and the time unit

fn foo(pydf: PyDataFrame, time_unit: &str) -> PyResult<PyDataFrame> {
    // Work on pydf and return a dataframe with column time2_epoch whose precision is time_unit
}

Back on the Python side I convert time2_epoch to a datetime column

rustdf.with_columns(
    pl.from_epoch(pl.col('time2_epoch'), time_unit).alias('time2')
).with_columns(
    pl.col('time2').dt.convert_time_zone(time_zone)
)

Is it possible to pass pydf directly to foo and perform the time conversions in Rust?

I can get the type of the time1 column (an enum AFAIK), but I don't know how to extract the time zone and time unit from this

let schema = pydf.schema();
let time1_type = schema.get("time1").unwrap();

Solution

  • Here's a way to do it. It just uses DataFrame but it should just be a matter of doing .df from the PyDataFrame and then PyDataFrame::new() for the output. This is copying and pasting chunks of other things I have so it looks more awkward than it probably needs to look if you clean it up a bit.

    fn foo(df: DataFrame) -> DataFrame {
        let time1 = df.column("time1").unwrap().as_materialized_series();
        let dtype=time1.dtype();
        let (seconds_denominator, ns_mod, ns_mult,tz) = match dtype {
            DataType::Datetime(time_unit,tz) => {
                match time_unit {
                    TimeUnit::Milliseconds => (1_000,         1_000,         1_000_000, tz),
                    TimeUnit::Microseconds => (1_000_000,     1_000_000,     1_000,     tz),
                    TimeUnit::Nanoseconds =>  (1_000_000_000, 1_000_000_000, 1,         tz),
                }
            },
            _=>panic!("not datetime")
        };
        let time1 = time1.datetime().unwrap();
        let out_time:Vec<AnyValue> = time1.into_iter().map(|from_epoch| {
            match (from_epoch) {
                Some(from_epoch)=>{
                    let seconds= from_epoch/seconds_denominator;
                    let nanoseconds = from_epoch%ns_mod * ns_mult; 
                    let raw_utc = DateTime::from_timestamp(seconds, nanoseconds as u32).unwrap();
                    match tz {
                        Some(tz_str)=> {
                            let timezone: chrono_tz::Tz = tz_str.parse().unwrap();
                            let tz_aware = raw_utc.with_timezone(&timezone);
                            // Do your datetime operations on tz_aware here
    
                            AnyValue::Datetime(tz_aware.timestamp_micros(), TimeUnit::Microseconds, Some(tz_str))
                        },
                        None=> {
                            // Do your datetime operations on raw_utc here and let it return up to out_time
                            AnyValue::Datetime(raw_utc.timestamp_micros(), TimeUnit::Microseconds, None)
                        }
    
                    }
                },
                None=>AnyValue::Null
            }
        }).collect();
        let out_column = Series::from_any_values_and_dtype("time2".into(), &out_time,dtype, true).unwrap().into_column();
        let mut df2 = df.clone();
    
        let df2=df2.with_column(out_column).unwrap();
        let df3=df2.clone();
        df3
    
    }