I am a new polars
user and I want to apply a function in every polars DataFrame
row. In pandas
I would use the apply
function specifying that the input of the function is the DataFrame
's row instead of the DataFrame
's column(s).
I saw the apply
function of polars library, and it says that it is preferable, because it is much more efficient, to use the Expression API instead of the apply
function on a polars DataFrame
. The documentation has examples of the Expression API with the select
function, but select
is used with the DataFrames
's columns. Is there a way to use the Expression API with the rows of the DataFrame
?
Edit for providing an example
I have a DataFrame
with this structure
l=[(1,2,3,4,22,23,None,None),(5,6,8,10,None,None,None,None)]
df=pl.DataFrame(data=l, orient='row')
i.e. a DataFrame
that at some point and until the end, a row has None
values. In this example, in the first row the None
values start at column 6, while in the second, the None
values start at column 4.
What I want to do is to find the most efficient polars way to turn this DataFrame
into a DataFrame
with only three columns, where the first column is the first element of the row, the second column is the second element of the row, and the third will have as a list all the other elements of the following columns that are not None
.
If you're using the column names, you can:
.list.drop_nulls()
to get rid of the "None"df.select(
pl.col("column_0", "column_1"),
pl.concat_list(pl.exclude("column_0", "column_1"))
.list.drop_nulls()
)
shape: (2, 3)
┌──────────┬──────────┬──────────────┐
│ column_0 ┆ column_1 ┆ column_2 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ list[i64] │
╞══════════╪══════════╪══════════════╡
│ 1 ┆ 2 ┆ [3, 4, … 23] │
│ 5 ┆ 6 ┆ [8, 10] │
└──────────┴──────────┴──────────────┘