Search code examples
rust-polars

Polars - How to fill_null with mode on column (&str)


I'm trying out polars (Rust) on kaggle titanic dataset (https://www.kaggle.com/competitions/titanic/data), and there is a column called "Cabin" where there are null values.

I've been trying to use fill_null and setting that to the mode of that column, however it doesn't seem to change it?

fn main() -> Result<()> {
    let q = LazyCsvReader::new("data/train.csv")
        .has_header(true)
        .finish()?;

    let df = q
        .collect()?;

    let fill = df.clone()
        .lazy()
        .with_columns([col("Cabin").fill_null(col("Cabin").mode())])
        .collect()?;

    println!("{:?}", df.null_count());
    println!("{:?}", fill.null_count());

    Ok(())
}

The output of that is

shape: (1, 12)
┌─────────────┬──────────┬────────┬──────┬───┬────────┬──────┬───────┬──────────┐
│ PassengerId ┆ Survived ┆ Pclass ┆ Name ┆ … ┆ Ticket ┆ Fare ┆ Cabin ┆ Embarked │
│ ---         ┆ ---      ┆ ---    ┆ ---  ┆   ┆ ---    ┆ ---  ┆ ---   ┆ ---      │
│ u32         ┆ u32      ┆ u32    ┆ u32  ┆   ┆ u32    ┆ u32  ┆ u32   ┆ u32      │
╞═════════════╪══════════╪════════╪══════╪═══╪════════╪══════╪═══════╪══════════╡
│ 0           ┆ 0        ┆ 0      ┆ 0    ┆ … ┆ 0      ┆ 0    ┆ 687   ┆ 2        │
└─────────────┴──────────┴────────┴──────┴───┴────────┴──────┴───────┴──────────┘
shape: (1, 12)
┌─────────────┬──────────┬────────┬──────┬───┬────────┬──────┬───────┬──────────┐
│ PassengerId ┆ Survived ┆ Pclass ┆ Name ┆ … ┆ Ticket ┆ Fare ┆ Cabin ┆ Embarked │
│ ---         ┆ ---      ┆ ---    ┆ ---  ┆   ┆ ---    ┆ ---  ┆ ---   ┆ ---      │
│ u32         ┆ u32      ┆ u32    ┆ u32  ┆   ┆ u32    ┆ u32  ┆ u32   ┆ u32      │
╞═════════════╪══════════╪════════╪══════╪═══╪════════╪══════╪═══════╪══════════╡
│ 0           ┆ 0        ┆ 0      ┆ 0    ┆ … ┆ 0      ┆ 0    ┆ 687   ┆ 2        │
└─────────────┴──────────┴────────┴──────┴───┴────────┴──────┴───────┴──────────┘

Am I missing something here ?


Solution

  • If you check in the csv, the most common element is null, which should count as "the most occurring value." in mode()

    So, it appears what's happening is you're saying replace all null with null.

    Try picking something other than mode, or do a filter then mode on that result and you should see it replace the values.