Search code examples
pythondataframeapplypython-polars

Apply function to all columns of a Polars-DataFrame


I know how to apply a function to all columns present in a Pandas-DataFrame. However, I have not figured out yet how to achieve this when using a Polars-DataFrame.

I checked the section from the Polars User Guide devoted to this topic, but I have not find the answer. Here I attach a code snippet with my unsuccessful attempts.

import numpy as np
import polars as pl
import seaborn as sns

# Loading toy dataset as Pandas DataFrame using Seaborn
df_pd = sns.load_dataset('iris')

# Converting Pandas DataFrame to Polars DataFrame
df_pl = pl.DataFrame(df_pd)

# Dropping the non-numeric column...
df_pd = df_pd.drop(columns='species')                     # ... using Pandas
df_pl = df_pl.drop('species')                             # ... using Polars

# Applying function to the whole DataFrame...
df_pd_new = df_pd.apply(np.log2)                          # ... using Pandas
# df_pl_new = df_pl.apply(np.log2)                        # ... using Polars?

# Applying lambda function to the whole DataFrame...
df_pd_new = df_pd.apply(lambda c: np.log2(c))             # ... using Pandas
# df_pl_new = df_pl.apply(lambda c: np.log2(c))           # ... using Polars?

Thanks in advance for your help and your time.


Solution

  • You can use the expression syntax to select all columns with pl.all() and then map_batches the numpy np.log2(..) function over the columns.

    df.select(
        pl.all().map_batches(np.log2)
    )
    

    Note that we choose map_batches here as map_elements would call the function upon each value.

    map_elements = pl.Series(np.log2(value) for value in pl.Series([1, 2, 3]))
    

    But np.log2 can be called once with multiple values, which would be faster.

    map_batches = np.log2(pl.Series([1, 2, 3]))
    

    See the User guide for more.

    • map_elements: Call a function separately on each value in the Series.
    • map_batches: Always passes the full Series to the function.

    Numpy

    Polars expressions also support numpy universal functions.

    That means you can pass a polars expression to a numpy ufunc:

    df.select(
        np.log2(pl.all())
    )