Search code examples
pythonpython-3.xdataframepython-polars

Pythonic way to get polars data frame absolute max values of all relevant columns


I want to create absolute maximum of all Polars data frame columns. Here is a way, but surely could be improved.

import numpy as np
import polars as pl

df = pl.DataFrame({
    "name": ["one", "one", "one", "two", "two", "two"],
    "val1": [1.2, -2.3, 3, -3.3, 2.2, -1.3],
    "val2": [1,2,3,-4,-3,-2]
    })

absVals = []
for col in df.columns:
    try:
        absVals.append((lambda arr: max(arr.min(), arr.max(), key=abs)) (df[col]))
    except:
        absVals.append(np.NaN)

df_out= pl.DataFrame(data=absVals).transpose()
df_out.columns=df.columns
print(df_out)

Outputs -

enter image description here


Solution

  • You can use

    import numpy as np
    import polars as pl
    
    df = pl.DataFrame({
        "name": ["one", "one", "one", "two", "two", "two"],
        "val1": [1.2, -2.3, 3, -3.3, 2.2, -1.3],
        "val2": [1,2,3,-4,-3,-2]
        })
    

    polars.DataFrame.schema

    schema has the Column name as well as the Column datatype.

    pl.NUMERIC_DTYPES contains all the numeric datatypes.

    for col,typ in df.schema.items():
        if typ in pl.NUMERIC_DTYPES:
            print(max(df[col],key=abs))
        else:
            print(np.NaN)
      
    

    Using List comprehension in one line:

    [max(df[col],key=abs) if typ in pl.NUMERIC_DTYPES else np.NaN for col,typ in df.schema.items()]
    
    output
    #[nan, -3.3, -4]
    

    Bonus:

    Your code without lambda:

    lst=[]
    for col in df.columns:
        try:
            lst.append(max(df[col],key=abs))
        except:
            lst.append(np.NaN)
    
    print(lst)
    [nan, -3.3, -4]