Search code examples
dataframepython-polars

change the values of a polars Dataframe column


I am new to polars library and I want to find the most efficient way for the following:

I have the DataFrame

df = pl.DataFrame({'col 1': [[1, 2, 3, 4, 5, 6],[11, 12, 13, 14, 15, 16],[21, 22, 23, 24, 25, 26]]})

and I want to change every list into a list of pairs of consecutive elements. For example the lists of the first and second rows will turn into

[(1,2),(3,4),(5,6)] 
[(11,12),(13,14),(15,16)]

respectively.

A way to transform each list is with the following code

l = [1, 2, 3, 4, 5, 6]
[e for e in zip(l[::2], l[1::2])]

I know that polars works best with the Expression API. Can I do it utilizing the API?


Solution

  • Here's another subtle way to do it, since it's grouping consecutive values: we're changing the dimensions of the list from 6x1 to 3x2. Thus, a reshape of the underlying Series per row with list.eval:

    df.with_columns(pl.col('col 1').list.eval(pl.element().reshape((-1, 2))))
    
    shape: (3, 1)
    ┌────────────────────────────────┐
    │ col 1                          │
    │ ---                            │
    │ list[array[i64, 2]]            │
    ╞════════════════════════════════╡
    │ [[1, 2], [3, 4], [5, 6]]       │
    │ [[11, 12], [13, 14], [15, 16]] │
    │ [[21, 22], [23, 24], [25, 26]] │
    └────────────────────────────────┘
    

    Do note that reshape results in an Array datatype.