Say I have this:
df = polars.DataFrame(dict(
j=numpy.random.randint(10, 99, 9),
k=numpy.tile([1, 2, 2], 3),
))
j (i64) k (i64)
47 1
22 2
82 2
19 1
85 2
15 2
89 1
74 2
26 2
shape: (9, 2)
where column k
is kind of a marker - 1
starts and then there are one or more 2
s (in the above example always two for simplicity, but in practice one or more). I'd like to get values in j
that correspond to k=1
and the last corresponding k=2
. For the above:
j (i64) k (i64)
47 1 >-\
22 2 | these are the 1 and the last of its matching 2s
82 2 <-/
19 1 >-\
85 2 | these are the 1 and the last of its matching 2s
15 2 <-/
89 1 >-\
74 2 | these are the 1 and the last of its matching 2s
26 2 <-/
shape: (9, 2)
and I'd like to put these in two columns, so I get this:
j (i64) k (i64)
47 82
19 15
89 26
shape: (9, 2)
How would I approach this in polars?
You can filter
simply by looking for k=1
or when the next k
, e.g. a shift
, is 1
:
df.select(
j=pl.col('j').filter(pl.col('k') == 1),
k=pl.col('j').filter(pl.col('k').shift(-1).fill_null(1) == 1),
)
shape: (3, 2)
┌─────┬─────┐
│ j ┆ k │
│ --- ┆ --- │
│ i32 ┆ i32 │
╞═════╪═════╡
│ 47 ┆ 82 │
│ 19 ┆ 15 │
│ 89 ┆ 26 │
└─────┴─────┘