Search code examples
pythonnumpydatetimeselectpython-polars

Convert column to Numpy within a select statement in Polars


I am trying to transform a date column to the next business day after each date (if the date isn't already a business day in which case it remains unchanged). To do this I am using a Numpy function called busday_offset which takes a Numpy array as its first parameter. The line of code bellow doesn't work because pl.col('creation_date') isn't a Polars column but some kind of abstract object. Thus, ".to_numpy" can't be called on it to convert the column to a numpy array.

sales_orders.with_columns(
    [
        pl.lit(np.busday_offset(pl.col('creation_date').to_numpy(), 0, roll='forward'))
    ]
)

-> AttributeError: 'Expr' object has no attribute 'to_numpy'

I am aware that a workaround would be to use the ".get_column" method on the "sales_orders" table as such:

sales_orders.with_columns(
    [
        pl.lit(np.busday_offset(sales_orders.get_column('creation_date').to_numpy(), 0, roll='forward'))
    ]
)

But this lacks elegance and makes it impossible to have nested "with_columns" statements as such:

sales_orders.with_columns(
    [
        pl.lit(np.busday_offset(sales_orders.get_column('creation_date'), 0, roll='forward')).alias('order_acknowledgement_date')
    ]
).with_columns(
    [
        pl.lit(np.busday_offset(sales_orders.get_column('order_acknowledgement_date'), 2, roll='forward')).alias('ship_due_date')
    ]
)

-> ColumnNotFoundError: order_acknowledgement_date This indeed doesn't work because "order_acknowledgement_date" isn't part of the "sales_orders" table.

So my question is: how can I transform pl.col('creation_date') into a Numpy array within a select/with_columns statement without refering to the table sales_orders directly?


Solution

  • You can use .map()

    df = pl.DataFrame({"date": ["2011-03-20", "2011-03-22", "2011-03-24"]})
    df = df.with_columns(pl.col("date").str.to_date())
    
    df.with_columns(busday = 
       pl.col("date").map(lambda date: 
          pl.Series(np.busday_offset(date, 0, roll="forward"))
       )
    )
    
    shape: (3, 2)
    ┌────────────┬────────────┐
    │ date       ┆ busday     │
    │ ---        ┆ ---        │
    │ date       ┆ date       │
    ╞════════════╪════════════╡
    │ 2011-03-20 ┆ 2011-03-21 │
    │ 2011-03-22 ┆ 2011-03-22 │
    │ 2011-03-24 ┆ 2011-03-24 │
    └────────────┴────────────┘