Search code examples
pythondataframepython-polars

polars use Expression API with DataFrame's rows


I am a new polars user and I want to apply a function in every polars DataFrame row. In pandas I would use the apply function specifying that the input of the function is the DataFrame's row instead of the DataFrame's column(s).

I saw the apply function of polars library, and it says that it is preferable, because it is much more efficient, to use the Expression API instead of the apply function on a polars DataFrame. The documentation has examples of the Expression API with the select function, but select is used with the DataFrames's columns. Is there a way to use the Expression API with the rows of the DataFrame?

Edit for providing an example

I have a DataFrame with this structure

l=[(1,2,3,4,22,23,None,None),(5,6,8,10,None,None,None,None)]
df=pl.DataFrame(data=l, orient='row')

i.e. a DataFrame that at some point and until the end, a row has None values. In this example, in the first row the None values start at column 6, while in the second, the None values start at column 4.

What I want to do is to find the most efficient polars way to turn this DataFrame into a DataFrame with only three columns, where the first column is the first element of the row, the second column is the second element of the row, and the third will have as a list all the other elements of the following columns that are not None.


Solution

  • If you're using the column names, you can:

    df.select(
       pl.col("column_0", "column_1"), 
       pl.concat_list(pl.exclude("column_0", "column_1"))
         .list.drop_nulls()
    )
    
    shape: (2, 3)
    ┌──────────┬──────────┬──────────────┐
    │ column_0 ┆ column_1 ┆ column_2     │
    │ ---      ┆ ---      ┆ ---          │
    │ i64      ┆ i64      ┆ list[i64]    │
    ╞══════════╪══════════╪══════════════╡
    │ 1        ┆ 2        ┆ [3, 4, … 23] │
    │ 5        ┆ 6        ┆ [8, 10]      │
    └──────────┴──────────┴──────────────┘