Search code examples
pythonpython-polars

Find matching pairs and lay them out as columns in polars


Say I have this:

df = polars.DataFrame(dict(
  j=numpy.random.randint(10, 99, 9),
  k=numpy.tile([1, 2, 2], 3),
  ))
  
 j (i64)  k (i64)
 47       1
 22       2
 82       2
 19       1
 85       2
 15       2
 89       1
 74       2
 26       2
shape: (9, 2)

where column k is kind of a marker - 1 starts and then there are one or more 2s (in the above example always two for simplicity, but in practice one or more). I'd like to get values in j that correspond to k=1 and the last corresponding k=2. For the above:

 j (i64)  k (i64)
 47       1 >-\
 22       2   | these are the 1 and the last of its matching 2s
 82       2 <-/
 19       1 >-\
 85       2   | these are the 1 and the last of its matching 2s
 15       2 <-/
 89       1 >-\
 74       2   | these are the 1 and the last of its matching 2s
 26       2 <-/
shape: (9, 2)

and I'd like to put these in two columns, so I get this:

 j (i64)  k (i64)
 47       82
 19       15
 89       26
shape: (9, 2)

How would I approach this in polars?


Solution

  • You can filter simply by looking for k=1 or when the next k, e.g. a shift, is 1:

    df.select(
        j=pl.col('j').filter(pl.col('k') == 1),
        k=pl.col('j').filter(pl.col('k').shift(-1).fill_null(1) == 1),
    )
    
    shape: (3, 2)
    ┌─────┬─────┐
    │ j   ┆ k   │
    │ --- ┆ --- │
    │ i32 ┆ i32 │
    ╞═════╪═════╡
    │ 47  ┆ 82  │
    │ 19  ┆ 15  │
    │ 89  ┆ 26  │
    └─────┴─────┘