Search code examples
pythonpython-polars

fastest way to use cmaps in Polars


I would like to create a column of color lists of type [r,g,b,a] from another float column using a matplotlib cmap.

Is there a faster way then:

data.with_columns(
    pl.col("floatCol")/100)
        .map_elements(cmap1)
    )

Minimal working example:

import matplotlib as mpl
import polars as pl

cmap1 = mpl.colors.LinearSegmentedColormap.from_list("GreenBlue", ["limegreen", "blue"])


data = pl.DataFrame(
    {
 "floatCol": [12,135.8, 1235.263,15.236],
 "boolCol": [True, True, False, False]
     }
    )

data = data.with_columns(
pl.when( 
    pl.col("boolCol").not_()
    )
    .then(
    mpl.colors.to_rgba("r")
    )
    .otherwise(
    (
        pl.col("floatCol")/100)
        .map_elements(cmap1)
        )
    .alias("c1")
)

Solution

  • If you're using map_elements, then it can (nearly) always be faster ;)

    data.with_columns(
        pl.when(pl.col("boolCol").not_())
        .then(mpl.colors.to_rgba("r"))
        .otherwise((pl.col("floatCol") / 100).map_batches(lambda x: pl.Series(cmap1(x))))
        .alias("c1")
    )
    

    Use map_batches to operate over the entire column