Given the following 3 Polars dataframes
journeys = pl.DataFrame({'id':[1,2,3,4,5,6],'order_id':[11,12,13,14,14,15],'order_type':['restaurant','restaurant','restaurant','restaurant','grocery','grocery']})
restaurant_orders = pl.DataFrame({'id':[11,12,13,14],'item_count':[4,7,3,5]})
grocery_orders = pl.DataFrame({'id':[14,15],'item_count':[23,21]})
journeys
:
restaurant_orders
:
grocery_orders
:
I want to bring the item_count
to the journeys
dataframe.
The naive way is to filter the journeys
dataframe based on order_type
column, perform the join each filtered dataframe and finally concat them back together.
Is there a Polars idiomatic way to perform a conditional (polymorphic?) join to the journeys
dataframe based on the value of order_type
.
I don't think it is possible to join on a condition with polars
.
One way to achieve the desired result is to concatenate the two dataframe to join and add them a column order_type
.
restaurant_orders = restaurant_orders.with_columns(order_type = pl.lit("restaurant"))
grocery_orders = grocery_orders.with_columns(order_type = pl.lit("grocery"))
orders = pl.concat([restaurant_orders, grocery_orders])
journeys.join(orders, left_on=['order_id', 'order_type'], right_on=['id', 'order_type'], how='left')
If you have multiple dataframe to concatenate like this, you can use a function to do it based on the name of the DataFrame:
def concat_df(list_df: list) -> pl.DataFrame:
list_newdf = []
for df in list_df:
name = [x for x in globals() if globals()[x] is df][0]
name = name[:-7]
list_newdf.append(df.with_columns(order_type = pl.lit(name)))
return pl.concat(list_newdf)
list_df = [restaurant_orders, grocery_orders]
orders = concat_df(list_df)
journeys.join(orders, left_on=['order_id', 'order_type'], right_on=['id', 'order_type'], how='left')
It will produce the same result. Note that this second solution is dependent of the DataFrame name.