Search code examples
rustrust-polars

Create new Series from existing Series and map values


My use case: I have a column with wild time stamps (strings) and I want to parse them into a timestamp type.

The docs mention ChunkedArray to be the typed container for my strings.

However, I cannot complete the picture.

fn with_timestamps(mut df: DataFrame) -> Result<DataFrame, PolarsError> {
    let column = df.column("myTime")?.clone(); // clone, just for a good measure ...

    let ca = column.utf8()?; // ChunkedArray

    // I think I want something like this:
    let new_time = Series::new("newTime", ca.into_iter().map(|v: &str| 42).collect());

    // 42 is not a timestamp,
    // but maybe I can work on that from an integer

    df.with_column(new_time);
    df
}

Apart from that I need to find out whether .with_column() acts in place, I have trouble to identify from the docs what am I supposed to iterate over. Is is the Series, the ChunkedArray and do I construct the new Series from a new ChunkedArray or from an iterator I can collect() or whatever.

Edit

I have found this answer as well, and with a little bit of fighting I came up with this example which is working in my case:

let df = df!("Fruit" => &["Apple", "Apple", "Pear"],
                            "Color" => &["Red", "Yellow", "Green"],
                            "Date" => &["02/21/2022 07:51:00 AM", "2/21/2022 07:51:00 AM", "2/21/2022 07:51:00 AM"])?;
let options = StrpTimeOptions {
    fmt: Some("%-m/%-d/%Y %I:%M:%S %p".into()),
    date_dtype: polars::datatypes::DataType::Datetime(TimeUnit::Milliseconds, None),
    exact: true,
    ..Default::default()
};

let foo = df
    .clone()
    .lazy()
    .with_columns([
       col("Date")
       .str()
       .strptime(options)
       .alias("parsed date")
     ])
    .collect();

Please note the .lazy(). Without it, the Expr (col("Foo").alias("bar") seems not directly usable (is not a Series, whereas the lazy API expects only an expression), and my understanding of the Rust compiler message is not sufficient at the moment to figure out why and what would be the idiomatic way.


Solution

  • So I think this may be what you want:

    let parsed_time: Series = df
        .column("myTime")?
        .clone()
        .utf8()?
        .into_iter()
        .map(| v: Option<&str> | your_parse_fn(v) )
        .collect();
    
    df.with_column(parsed_time)
    df
    

    If I am correct--------- THIS WILL OVERWRITE YOUR DATA!

    comment and let me know