Search code examples
pythonpython-polars

Reducing code/expression duplication when using Polars `with_columns`?


Consider some Polars code like so:

df.with_columns(
    pl.date_ranges(
        pl.col("current_start"), pl.col("current_end"), "1mo", closed="left"
    ).alias("current_tpoints")
).drop("current_start", "current_end").with_columns(
    pl.date_ranges(
        pl.col("history_start"), pl.col("history_end"), "1mo", closed="left"
    ).alias("history_tpoints")
).drop(
    "history_start", "history_end"
)

The key issue to note here is the repetitiveness of history_* and current_*. I could reduce duplication by doing this:

for x in ["history", "current"]:
    fstring = f"{x}" + "_{other}"
    start = fstring.format(other="start")
    end = fstring.format(other="end")
    df = df.with_columns(
        pl.date_ranges(
            pl.col(start),
            pl.col(end),
            "1mo",
            closed="left",
        ).alias(fstring.format(other="tpoints"))
    ).drop(start, end)

But are there any other ways to reduce duplication I ought to consider?


Solution

  • As it seems like you might not need any original columns, you could use select() instead of with_columns(), so you don't need to drop() columns.

    And you can loop over column names within select() / with_columns():

    df.select(
        pl.date_ranges(
            pl.col(f"{c}_start"), pl.col(f"{c}_end"), "1mo", closed="left"
        ).alias(f"{c}_tpoints") for c in ["current", "history"]
    )
    

    To explain why this works:

    According to documentation, both select() and with_columns() methods can *exprs: IntoExpr | Iterable[IntoExpr] which means variable amount of arguments. You see it can be either multiple expressions or multiple lists of expressions.

    This is exactly what we can do with list comprehension, we just create list of expressions.

    [
        pl.date_ranges(
            pl.col(f"{c}_start"), pl.col(f"{c}_end"), "1mo", closed="left"
        ).alias(f"{c}_tpoints") for c in ["current", "history"]
    ]
    
    [<Expr ['col("current_start").date_rang…'] at 0x206D93030E0>,
     <Expr ['col("history_start").date_rang…'] at 0x206D8F85520>]
    

    Which we can then pass into the polars method. Notice that I didn't have square brackets in the final answer. This is cause we don't really need a list of expressions, we just need an iterable (in this case - generator).