I have this dataframe:
df = pl.DataFrame({
'date':['date1','date1','date1','date2','date3','date3'],
'factor':['A','B','C','B','B','C'], 'val':[1,2,3,3,1,5]
})
shape: (6, 3)
┌───────┬────────┬─────┐
│ date ┆ factor ┆ val │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 │
╞═══════╪════════╪═════╡
│ date1 ┆ A ┆ 1 │
│ date1 ┆ B ┆ 2 │
│ date1 ┆ C ┆ 3 │
│ date2 ┆ B ┆ 3 │
│ date3 ┆ B ┆ 1 │
│ date3 ┆ C ┆ 5 │
└───────┴────────┴─────┘
Some of the factors are missing. I'd like to fill in the gaps with values 0.
shape: (9, 3)
┌───────┬────────┬───────┐
│ date ┆ factor ┆ value │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 │
╞═══════╪════════╪═══════╡
│ date1 ┆ A ┆ 1 │
│ date1 ┆ B ┆ 2 │
│ date1 ┆ C ┆ 3 │
│ date2 ┆ A ┆ 0 │
│ date2 ┆ B ┆ 3 │
│ date2 ┆ C ┆ 0 │
│ date3 ┆ A ┆ 0 │
│ date3 ┆ B ┆ 1 │
│ date3 ┆ C ┆ 5 │
└───────┴────────┴───────┘
For pure readability/"polars"icity, I think
testdf.pivot(values="val", index="date", columns="factor", aggregate_function="first").melt(id_vars="date", variable_name="factor",value_name="value")
┌───────┬────────┬───────┐
│ date ┆ factor ┆ value │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 │
╞═══════╪════════╪═══════╡
│ date1 ┆ A ┆ 1 │
│ date2 ┆ A ┆ null │
│ date3 ┆ A ┆ null │
│ date1 ┆ B ┆ 2 │
│ date2 ┆ B ┆ 3 │
│ date3 ┆ B ┆ 1 │
│ date1 ┆ C ┆ 3 │
│ date2 ┆ C ┆ null │
│ date3 ┆ C ┆ 5 │
└───────┴────────┴───────┘
is good, since it makes most clear what you are trying to do, make the dataframe look like a usual "melted" one. I haven't benchmarked it though.