Search code examples
dataframerustrust-polars

how to covert a whole column of date strings to integers


i have a dataframe that look like this enter image description here

i would like to convert the first column 'time' with date strings into a column with only numbers in format of "YYYYMMDD"(e.g.: 20230531) in u64.

i tried building up a function to do this but i am struggling and espcially in how to remove the hyphens in date strings.

pub fn convert_str_to_num(df: &DataFrame) -> Result<DataFrame, PolarsError> {
    let mut final_df = df.clone();
    let col_name = String::from("time");
    let time_col = df.column(col_name.as_str())?;
    let mut new_time_col = time_col.clone().replace("-", "")?;
    // replace the old column with the new one
    final_df.replace(col_name.as_str(), new_time_col.as_mut())?;
    Ok(final_df)
}

somehow this returns

error[E0599]: no method named `replace` found for struct `polars::prelude::Series` in the current scope
  --> src/main.rs:13:45
   |
13 |     let mut new_time_col = time_col.clone().replace("-", "")?;
   |                                             ^^^^^^^ method not found in `Series`

Solution

  • turns out i have solved my own question.

    fn convert_str_to_int(mut df: DataFrame, date_col_name: &str) -> Result<DataFrame, PolarsError> {
        // Get the date column as a Series
        let date_col = df.column(date_col_name)?;
        // Convert each date string into an unsigned 32-bit integer value in the form of "YYYYMMDD"
        let int_values = date_col
            .utf8()?
            .into_iter()
            .map(|date_str| {
                let int_str = Cow::from(date_str.unwrap().replace('-', ""));
                // Parse the integer value as u32
                int_str.parse::<u32>().unwrap()
            })
            .collect::<Vec<_>>();
        // Create a new UInt32Chunked to replace the original column
        let u32_col = UInt32Chunked::new(date_col_name, int_values).into_series();
        // Create a new DataFrame with the converted unsigned 32-bit integer column
        df.replace(date_col_name, u32_col)?;
        Ok(df)
    }