Search code examples
pythonpython-polarsibis

Use ibis-framework to compute shifts (lags) in dataframe


Say I want to do the following in Polars:

df.with_columns(
    a_1 = pl.col('a').shift(1),
    a_2 = pl.col('a').shift(2),
    b_1 = pl.col('b').shift(1),
    b_2 = pl.col('b').shift(2),
)

Say, starting from

import polars as pl

df = pl.DataFrame({'a': [1,3,2,4], 'b': [5,1,2,1]})

So, desired output:

shape: (4, 6)
┌─────┬─────┬──────┬──────┬──────┬──────┐
│ a   ┆ b   ┆ a_1  ┆ a_2  ┆ b_1  ┆ b_2  │
│ --- ┆ --- ┆ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64 ┆ i64 ┆ i64  ┆ i64  ┆ i64  ┆ i64  │
╞═════╪═════╪══════╪══════╪══════╪══════╡
│ 1   ┆ 5   ┆ null ┆ null ┆ null ┆ null │
│ 3   ┆ 1   ┆ 1    ┆ null ┆ 5    ┆ null │
│ 2   ┆ 2   ┆ 3    ┆ 1    ┆ 1    ┆ 5    │
│ 4   ┆ 1   ┆ 2    ┆ 3    ┆ 2    ┆ 1    │
└─────┴─────┴──────┴──────┴──────┴──────┘

How can I do this with ibis?

I need to do this starting from the Polars dataframe df, so I need to connect to that using Ibis. Looking at the docs doesn't give much away in terms of what I'm actually supposed to do


Is it correct to do:

t = ibis.memtable(df)
t = t.mutate(
    ibis._['a'].lead(1),
    ibis._['a'].lead(2),
    ibis._['b'].lead(1),
    ibis._['b'].lead(2),
)
t.to_polars()

Solution

  • you can use lag() window function:

    t.mutate(
        a_1 = t.a.lag(1),
        a_2 = t.a.lag(2),
        b_1 = t.b.lag(1),
        b_2 = t.b.lag(2)
    ).to_polars()
    
    
    shape: (4, 6)
    ┌─────┬─────┬──────┬──────┬──────┬──────┐
    │ a   ┆ b   ┆ a_1  ┆ a_2  ┆ b_1  ┆ b_2  │
    │ --- ┆ --- ┆ ---  ┆ ---  ┆ ---  ┆ ---  │
    │ i64 ┆ i64 ┆ i64  ┆ i64  ┆ i64  ┆ i64  │
    ╞═════╪═════╪══════╪══════╪══════╪══════╡
    │ 1   ┆ 5   ┆ null ┆ null ┆ null ┆ null │
    │ 3   ┆ 1   ┆ 1    ┆ null ┆ 5    ┆ null │
    │ 2   ┆ 2   ┆ 3    ┆ 1    ┆ 1    ┆ 5    │
    │ 4   ┆ 1   ┆ 2    ┆ 3    ┆ 2    ┆ 1    │
    └─────┴─────┴──────┴──────┴──────┴──────┘