I am new to polars
library and I want to find the most efficient way for the following:
I have the DataFrame
df = pl.DataFrame({'col 1': [[1, 2, 3, 4, 5, 6],[11, 12, 13, 14, 15, 16],[21, 22, 23, 24, 25, 26]]})
and I want to change every list into a list of pairs of consecutive elements. For example the lists of the first and second rows will turn into
[(1,2),(3,4),(5,6)]
[(11,12),(13,14),(15,16)]
respectively.
A way to transform each list is with the following code
l = [1, 2, 3, 4, 5, 6]
[e for e in zip(l[::2], l[1::2])]
I know that polars works best with the Expression API. Can I do it utilizing the API?
Here's another subtle way to do it, since it's grouping consecutive values: we're changing the dimensions of the list from 6x1 to 3x2. Thus, a reshape
of the underlying Series per row with list.eval
:
df.with_columns(pl.col('col 1').list.eval(pl.element().reshape((-1, 2))))
shape: (3, 1)
┌────────────────────────────────┐
│ col 1 │
│ --- │
│ list[array[i64, 2]] │
╞════════════════════════════════╡
│ [[1, 2], [3, 4], [5, 6]] │
│ [[11, 12], [13, 14], [15, 16]] │
│ [[21, 22], [23, 24], [25, 26]] │
└────────────────────────────────┘
Do note that reshape
results in an Array datatype.