Search code examples
pythondataframepython-polars

Find the index of the first non-null value in a column in a polars dataframe


I need to find the first non-null value in a column over a grouped pl.DataFrame.

import polars as pl

df = pl.DataFrame(
    {
        "symbol": ["s1", "s1", "s2", "s2"],
        "trade": [None, 1, -1, None],
    }
)

shape: (4, 2)
┌────────┬───────┐
│ symbol ┆ trade │
│ ---    ┆ ---   │
│ str    ┆ i64   │
╞════════╪═══════╡
│ s1     ┆ null  │
│ s1     ┆ 1     │
│ s2     ┆ -1    │
│ s2     ┆ null  │
└────────┴───────┘

How can I get the row numbers/index values of the first non-null value in the trade columns while group_by symbol?

I am actually looking for the row/index numbers 1 and 0. Maybe the result could be something like this:

shape: (2, 2)
┌────────┬────────────────┐
│ symbol ┆ first-non-null │
│ ---    ┆ ---            │
│ str    ┆ i64            │
╞════════╪════════════════╡
│ s1     ┆ 1              │
│ s2     ┆ 0              │
└────────┴────────────────┘

I am actually looking for the equivalent to pd.first_valid_index()


Solution

  • Here's one way using .arg_true().first():

    print(
        df.group_by("symbol").agg(
            pl.col("trade").is_not_null().arg_true().first().alias("first-non-null")
        )
    )
    

    Output:

    ┌────────┬────────────────┐
    │ symbol ┆ first-non-null │
    │ ---    ┆ ---            │
    │ str    ┆ u32            │
    ╞════════╪════════════════╡
    │ s1     ┆ 1              │
    │ s2     ┆ 0              │
    └────────┴────────────────┘