Search code examples
pythonnumpypython-polars

Is there a way to vectorise over ragged arrays in polars


I have a column with lists of different length like below and want to make a parallel np.diff on each of the independent arrays.

import polars as pl
import numpy as np
np.random.seed(0)
ragged_arrays = [np.random.randint(10, size=np.random.choice(range(10))) for _ in range(5)]

df = pl.DataFrame({'values':ragged_arrays})
df

shape: (5, 1)
┌──────────────────────────┐
│ values                   │
│ ---                      │
│ list[i64]                │
╞══════════════════════════╡
│ [0, 3, 3, 7, 9]          │
│ [5, 2, 4]                │
│ [6, 8, 8, 1, 6, 7, 7]    │
│ [1, 5, 9, 8, 9, 4, 3, 0] │
│ [5, 0, 2]                │
└──────────────────────────┘

I have tried to simply apply np.diff like this:

df.select(
    np.diff(pl.col("values"))
)

But it gives me this error:

ValueError: diff requires input that is at least one dimensional

It looks like this type of vectorisation is not supported at the moment, but is there any workaround to achieve the same thing with polars? I want to avoid having to group arrays by length before running this.


Solution

  • All of the list methods are available in the List namespace

    In this case, Polars has its own .list.diff()

    np.random.seed(0)
    ragged_arrays = [pl.Series(np.random.randint(10, size=np.random.choice(range(10)))) for _ in range(5)]
    
    (pl.DataFrame({
        "values": ragged_arrays
    }).with_columns(
        pl.col("values").list.diff().alias("values_diff")
    ))
    

    This yields

    shape: (5, 2)
    ┌──────────────────────────┬─────────────────────────────────┐
    │ values                   ┆ values_diff                     │
    │ ---                      ┆ ---                             │
    │ list[i64]                ┆ list[i64]                       │
    ╞══════════════════════════╪═════════════════════════════════╡
    │ [0, 3, 3, 7, 9]          ┆ [null, 3, 0, 4, 2]              │
    │ [5, 2, 4]                ┆ [null, -3, 2]                   │
    │ [6, 8, 8, 1, 6, 7, 7]    ┆ [null, 2, 0, -7, 5, 1, 0]       │
    │ [1, 5, 9, 8, 9, 4, 3, 0] ┆ [null, 4, 4, -1, 1, -5, -1, -3] │
    │ [5, 0, 2]                ┆ [null, -5, 2]                   │
    └──────────────────────────┴─────────────────────────────────┘