I am trying to achieve the following:
aggregate_commitment_df = commitment_df.groupby("sprint_start_date", "squad").agg(
stories_committed=pl.count(pl.col("issue_type") == "Story"),
spikes_committed=pl.count(pl.col("issue_type") == "Spike"),
bugs_committed=pl.count(pl.col("issue_type") == "Bug"),
story_points_committed=pl.sum("story_points"),
)
where, for example, stories_committed
will be the count of all rows with issue_type == "Story"
for each sprint_start_date
and squad
.
You'll want to use filter in the aggregation context before you use count
.
import polars as pl
from datetime import date
commitment_df = pl.DataFrame({
"sprint_start_date": [date(2023,8,1),date(2023,8,2),date(2023,8,3),] * 8,
"squad": ["team1", "team2"] * 12,
"issue_type": ["Story"] * 10 + ["Spike"] * 8 + ["Bug"] * 6,
"story_points": [*range(4)] * 6,
})
aggregate_commitment_df = commitment_df.groupby("sprint_start_date", "squad").agg(
stories_committed=pl.col("issue_type").filter(pl.col("issue_type") == "Story").count(),
spikes_committed=pl.col("issue_type").filter(pl.col("issue_type") == "Spike").count(),
bugs_committed=pl.col("issue_type").filter(pl.col("issue_type") == "Bug").count(),
story_points_committed=pl.sum("story_points"),
)
print(aggregate_commitment_df)
Results:
shape: (6, 6)
┌───────────────────┬───────┬───────────────────┬──────────────────┬────────────────┬────────────────────────┐
│ sprint_start_date ┆ squad ┆ stories_committed ┆ spikes_committed ┆ bugs_committed ┆ story_points_committed │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ date ┆ str ┆ u32 ┆ u32 ┆ u32 ┆ i64 │
╞═══════════════════╪═══════╪═══════════════════╪══════════════════╪════════════════╪════════════════════════╡
│ 2023-08-03 ┆ team2 ┆ 1 ┆ 2 ┆ 1 ┆ 8 │
│ 2023-08-02 ┆ team1 ┆ 1 ┆ 2 ┆ 1 ┆ 4 │
│ 2023-08-01 ┆ team1 ┆ 2 ┆ 1 ┆ 1 ┆ 4 │
│ 2023-08-03 ┆ team1 ┆ 2 ┆ 1 ┆ 1 ┆ 4 │
│ 2023-08-02 ┆ team2 ┆ 2 ┆ 1 ┆ 1 ┆ 8 │
│ 2023-08-01 ┆ team2 ┆ 2 ┆ 1 ┆ 1 ┆ 8 │
└───────────────────┴───────┴───────────────────┴──────────────────┴────────────────┴────────────────────────┘