Search code examples
python-polars

Repeat values to next rows based on column values


I would like to update the values of columns 'e' and 'f' in such a way that, if column 'diff' has a value of either '-1' or '1' then assign the values of columns 'b' and 'c' correspondingly to column 'e' and 'f' for the current row and also to subsequent rows until the value of column 'diff' changes to '-1' or '1'.

Below is the input.

import polars as pl
pl.Config(tbl_rows=20)
df= pl.LazyFrame({
    "a" : [0,1,2,3,4,5,6,7,8,9],
    "b" : [234,34,43,6,343,76,67,23,878,12],
    "c" : [934,23,1,34,2,87,23,698,65,56],
    "d" : [1,1,0,0,0,0,1,1,1,1]
})
df = df.with_columns(pl.col('d').diff().alias('diff')).fill_null(0)

df.collect()
shape: (10, 5)
┌─────┬─────┬─────┬─────┬──────┐
│ a   ┆ b   ┆ c   ┆ d   ┆ diff │
│ --- ┆ --- ┆ --- ┆ --- ┆ ---  │
│ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64  │
╞═════╪═════╪═════╪═════╪══════╡
│ 0   ┆ 234 ┆ 934 ┆ 1   ┆ 0    │
│ 1   ┆ 34  ┆ 23  ┆ 1   ┆ 0    │
│ 2   ┆ 43  ┆ 1   ┆ 0   ┆ -1   │
│ 3   ┆ 6   ┆ 34  ┆ 0   ┆ 0    │
│ 4   ┆ 343 ┆ 2   ┆ 0   ┆ 0    │
│ 5   ┆ 76  ┆ 87  ┆ 0   ┆ 0    │
│ 6   ┆ 67  ┆ 23  ┆ 1   ┆ 1    │
│ 7   ┆ 23  ┆ 698 ┆ 1   ┆ 0    │
│ 8   ┆ 878 ┆ 65  ┆ 1   ┆ 0    │
│ 9   ┆ 12  ┆ 56  ┆ 1   ┆ 0    │
└─────┴─────┴─────┴─────┴──────┘

What I am looking for is arriving at this output.

final_df = df= pl.LazyFrame({
    "a" : [0,1,2,3,4,5,6,7,8,9],
    "b" : [234,34,43,6,343,76,67,23,878,12],
    "c" : [934,23,1,34,2,87,23,698,65,56],
    "d" : [1,1,0,0,0,0,1,1,1,1],
    "e" : [234,34,43,43,43,43,67,67,67,67],
    "f" : [934,23,1,1,1,1,23,23,23,23]
})
final_df.collect()
shape: (10, 6)
┌─────┬─────┬─────┬─────┬─────┬─────┐
│ a   ┆ b   ┆ c   ┆ d   ┆ e   ┆ f   │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╪═════╪═════╪═════╡
│ 0   ┆ 234 ┆ 934 ┆ 1   ┆ 234 ┆ 934 │
│ 1   ┆ 34  ┆ 23  ┆ 1   ┆ 34  ┆ 23  │
│ 2   ┆ 43  ┆ 1   ┆ 0   ┆ 43  ┆ 1   │
│ 3   ┆ 6   ┆ 34  ┆ 0   ┆ 43  ┆ 1   │
│ 4   ┆ 343 ┆ 2   ┆ 0   ┆ 43  ┆ 1   │
│ 5   ┆ 76  ┆ 87  ┆ 0   ┆ 43  ┆ 1   │
│ 6   ┆ 67  ┆ 23  ┆ 1   ┆ 67  ┆ 23  │
│ 7   ┆ 23  ┆ 698 ┆ 1   ┆ 67  ┆ 23  │
│ 8   ┆ 878 ┆ 65  ┆ 1   ┆ 67  ┆ 23  │
│ 9   ┆ 12  ┆ 56  ┆ 1   ┆ 67  ┆ 23  │
└─────┴─────┴─────┴─────┴─────┴─────┘

How to best achieve this with the polars API?

The following question Repeat rows in a Polars DataFrame based on column value , unfortunately is not helpful in my case.


Solution

  • The steps look like:

    A struct can be used to pack multiple columns together.

    new_cols = pl.struct(e = "b", f = "c")
    
    df.with_columns(
       pl.when(pl.col.d.diff() != 0).then(new_cols)
    )
    
    shape: (10, 5)
    ┌─────┬─────┬─────┬─────┬───────────┐
    │ a   ┆ b   ┆ c   ┆ d   ┆ e         │
    │ --- ┆ --- ┆ --- ┆ --- ┆ ---       │
    │ i64 ┆ i64 ┆ i64 ┆ i64 ┆ struct[2] │
    ╞═════╪═════╪═════╪═════╪═══════════╡
    │ 0   ┆ 234 ┆ 934 ┆ 1   ┆ null      │
    │ 1   ┆ 34  ┆ 23  ┆ 1   ┆ null      │
    │ 2   ┆ 43  ┆ 1   ┆ 0   ┆ {43,1}    │
    │ 3   ┆ 6   ┆ 34  ┆ 0   ┆ null      │
    │ 4   ┆ 343 ┆ 2   ┆ 0   ┆ null      │
    │ 5   ┆ 76  ┆ 87  ┆ 0   ┆ null      │
    │ 6   ┆ 67  ┆ 23  ┆ 1   ┆ {67,23}   │
    │ 7   ┆ 23  ┆ 698 ┆ 1   ┆ null      │
    │ 8   ┆ 878 ┆ 65  ┆ 1   ┆ null      │
    │ 9   ┆ 12  ┆ 56  ┆ 1   ┆ null      │
    └─────┴─────┴─────┴─────┴───────────┘
    

    Add in the remaining steps:

    df.with_columns(
       pl.when(pl.col.d.diff() != 0).then(new_cols)
         .forward_fill()
         .fill_null(new_cols)
         .struct.field("e", "f") # unpack into columns
    )
    
    shape: (10, 6)
    ┌─────┬─────┬─────┬─────┬─────┬─────┐
    │ a   ┆ b   ┆ c   ┆ d   ┆ e   ┆ f   │
    │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
    │ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64 │
    ╞═════╪═════╪═════╪═════╪═════╪═════╡
    │ 0   ┆ 234 ┆ 934 ┆ 1   ┆ 234 ┆ 934 │
    │ 1   ┆ 34  ┆ 23  ┆ 1   ┆ 34  ┆ 23  │
    │ 2   ┆ 43  ┆ 1   ┆ 0   ┆ 43  ┆ 1   │
    │ 3   ┆ 6   ┆ 34  ┆ 0   ┆ 43  ┆ 1   │
    │ 4   ┆ 343 ┆ 2   ┆ 0   ┆ 43  ┆ 1   │
    │ 5   ┆ 76  ┆ 87  ┆ 0   ┆ 43  ┆ 1   │
    │ 6   ┆ 67  ┆ 23  ┆ 1   ┆ 67  ┆ 23  │
    │ 7   ┆ 23  ┆ 698 ┆ 1   ┆ 67  ┆ 23  │
    │ 8   ┆ 878 ┆ 65  ┆ 1   ┆ 67  ┆ 23  │
    │ 9   ┆ 12  ┆ 56  ┆ 1   ┆ 67  ┆ 23  │
    └─────┴─────┴─────┴─────┴─────┴─────┘