Search code examples
python-polars

How do I fill in missing factors in a polars dataframe?


I have this dataframe:

df = pl.DataFrame({
    'date':['date1','date1','date1','date2','date3','date3'], 
    'factor':['A','B','C','B','B','C'], 'val':[1,2,3,3,1,5]
})
shape: (6, 3)
┌───────┬────────┬─────┐
│ date  ┆ factor ┆ val │
│ ---   ┆ ---    ┆ --- │
│ str   ┆ str    ┆ i64 │
╞═══════╪════════╪═════╡
│ date1 ┆ A      ┆ 1   │
│ date1 ┆ B      ┆ 2   │
│ date1 ┆ C      ┆ 3   │
│ date2 ┆ B      ┆ 3   │
│ date3 ┆ B      ┆ 1   │
│ date3 ┆ C      ┆ 5   │
└───────┴────────┴─────┘

Some of the factors are missing. I'd like to fill in the gaps with values 0.

shape: (9, 3)
┌───────┬────────┬───────┐
│ date  ┆ factor ┆ value │
│ ---   ┆ ---    ┆ ---   │
│ str   ┆ str    ┆ i64   │
╞═══════╪════════╪═══════╡
│ date1 ┆ A      ┆ 1     │
│ date1 ┆ B      ┆ 2     │
│ date1 ┆ C      ┆ 3     │
│ date2 ┆ A      ┆ 0     │
│ date2 ┆ B      ┆ 3     │
│ date2 ┆ C      ┆ 0     │
│ date3 ┆ A      ┆ 0     │
│ date3 ┆ B      ┆ 1     │
│ date3 ┆ C      ┆ 5     │
└───────┴────────┴───────┘

Solution

  • For pure readability/"polars"icity, I think

    testdf.pivot(values="val", index="date", columns="factor", aggregate_function="first").melt(id_vars="date", variable_name="factor",value_name="value")

    ┌───────┬────────┬───────┐
    │ date  ┆ factor ┆ value │
    │ ---   ┆ ---    ┆ ---   │
    │ str   ┆ str    ┆ i64   │
    ╞═══════╪════════╪═══════╡
    │ date1 ┆ A      ┆ 1     │
    │ date2 ┆ A      ┆ null  │
    │ date3 ┆ A      ┆ null  │
    │ date1 ┆ B      ┆ 2     │
    │ date2 ┆ B      ┆ 3     │
    │ date3 ┆ B      ┆ 1     │
    │ date1 ┆ C      ┆ 3     │
    │ date2 ┆ C      ┆ null  │
    │ date3 ┆ C      ┆ 5     │
    └───────┴────────┴───────┘
    

    is good, since it makes most clear what you are trying to do, make the dataframe look like a usual "melted" one. I haven't benchmarked it though.