I need to find the first non-null value in a column over a grouped pl.DataFrame
.
import polars as pl
df = pl.DataFrame(
{
"symbol": ["s1", "s1", "s2", "s2"],
"trade": [None, 1, -1, None],
}
)
shape: (4, 2)
┌────────┬───────┐
│ symbol ┆ trade │
│ --- ┆ --- │
│ str ┆ i64 │
╞════════╪═══════╡
│ s1 ┆ null │
│ s1 ┆ 1 │
│ s2 ┆ -1 │
│ s2 ┆ null │
└────────┴───────┘
How can I get the row numbers/index values of the first non-null value in the trade
columns while group_by symbol
?
I am actually looking for the row/index numbers 1
and 0
. Maybe the result could be something like this:
shape: (2, 2)
┌────────┬────────────────┐
│ symbol ┆ first-non-null │
│ --- ┆ --- │
│ str ┆ i64 │
╞════════╪════════════════╡
│ s1 ┆ 1 │
│ s2 ┆ 0 │
└────────┴────────────────┘
I am actually looking for the equivalent to pd.first_valid_index()
Here's one way using .arg_true().first()
:
print(
df.group_by("symbol").agg(
pl.col("trade").is_not_null().arg_true().first().alias("first-non-null")
)
)
Output:
┌────────┬────────────────┐
│ symbol ┆ first-non-null │
│ --- ┆ --- │
│ str ┆ u32 │
╞════════╪════════════════╡
│ s1 ┆ 1 │
│ s2 ┆ 0 │
└────────┴────────────────┘