Search code examples
pythonpython-polars

rolling_map with Null as return value in polars


Update: This issue is no longer present in Polars, and my function runs without error.


I want to use some custom function in rolling_map in polars. However, I met TypeError when doing below.

def ts_rank(expr: pl.Expr, window: int) -> pl.Expr:
    res = expr.cast(pl.Float64).rolling_map(
        lambda s: s.rank(method='average', descending=False)[-1]/s.is_not_null().sum(), 
        window_size = window,
        min_periods = window//2).over('a')
    return res
df = pl.DataFrame({"a": [1, 1, 1, 1, 2, 2, 2, 2], 
                   "b": [None, None, None, 1, 4, 2, 3, 8]})
df.with_columns(ts_rank(pl.col('b'),4).alias('rank'))

I got this error:

PanicException: python function failed: PyErr { type: , value: TypeError("unsupported operand type(s) for /: 'NoneType' and 'int'"), traceback: Some() }

Is this a correct 'polars' way to do rolling_rank? (For my own purpose, I have to write it as an Expr, not using DataFrame.rolling)


Solution

  • Directly using a None or any other non-real type won't work in this case, however using a pl.Series with a dtype of pl.Float64 will work.

    You can wrap the needed None in a new pl.Series.

    pl.Series(values=[None], dtype=pl.Float64)
    

    Here it is applied in your ts_rank function.

    import polars as pl
    
    
    def ts_rank(expr: pl.Expr, window: int) -> pl.Expr:
        def rank(s):
            tmp = s.rank(method="average", descending=False)[-1]
            if not tmp:
                return pl.Series(values=[None], dtype=pl.Float64)
            return tmp / s.is_not_null().sum()
    
        res = (
            expr.cast(pl.Float64)
            .rolling_map(rank, window_size=window, min_periods=window // 2)
            .over("a")
        )
        return res