I have two pl.DataFrame
s:
from datetime import date
import polars as pl
df1 = pl.DataFrame(
{
"symbol": [
"sec1", "sec1", "sec1", "sec1", "sec1", "sec1",
"sec2", "sec2", "sec2", "sec2", "sec2",
],
"date": [
date(2021, 9, 14),
date(2021, 9, 15),
date(2021, 9, 16),
date(2021, 9, 17),
date(2021, 8, 31),
date(2020, 12, 31),
date(2021, 9, 14),
date(2021, 9, 15),
date(2021, 8, 31),
date(2021, 12, 30),
date(2020, 12, 31),
],
"price": range(11),
}
)
df2 = pl.DataFrame(
{
"symbol": ["sec1", "sec2"],
"current_date": [date(2021, 9, 17), date(2021, 9, 15)],
"mtd": [date(2021, 8, 31), date(2021, 8, 31)],
"ytd": [date(2020, 12, 31), date(2020, 12, 30)],
}
)
with pl.Config(tbl_rows=-1):
print(df1)
print(df2)
shape: (11, 3)
┌────────┬────────────┬───────┐
│ symbol ┆ date ┆ price │
│ --- ┆ --- ┆ --- │
│ str ┆ date ┆ i64 │
╞════════╪════════════╪═══════╡
│ sec1 ┆ 2021-09-14 ┆ 0 │
│ sec1 ┆ 2021-09-15 ┆ 1 │
│ sec1 ┆ 2021-09-16 ┆ 2 │
│ sec1 ┆ 2021-09-17 ┆ 3 │
│ sec1 ┆ 2021-08-31 ┆ 4 │
│ sec1 ┆ 2020-12-31 ┆ 5 │
│ sec2 ┆ 2021-09-14 ┆ 6 │
│ sec2 ┆ 2021-09-15 ┆ 7 │
│ sec2 ┆ 2021-08-31 ┆ 8 │
│ sec2 ┆ 2021-12-30 ┆ 9 │
│ sec2 ┆ 2020-12-31 ┆ 10 │
└────────┴────────────┴───────┘
shape: (2, 4)
┌────────┬──────────────┬────────────┬────────────┐
│ symbol ┆ current_date ┆ mtd ┆ ytd │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ date ┆ date ┆ date │
╞════════╪══════════════╪════════════╪════════════╡
│ sec1 ┆ 2021-09-17 ┆ 2021-08-31 ┆ 2020-12-31 │
│ sec2 ┆ 2021-09-15 ┆ 2021-08-31 ┆ 2020-12-30 │
└────────┴──────────────┴────────────┴────────────┘
I need to filter the prices of df1
for each group with the respective dates from df2
. I need to incorporate all columns of type date
. The number of these columns in df2
might not be fixed.
I am looking for the following result:
shape: (11, 3)
┌────────┬────────────┬───────┐
│ symbol ┆ date ┆ price │
│ --- ┆ --- ┆ --- │
│ str ┆ date ┆ i64 │
╞════════╪════════════╪═══════╡
│ sec1 ┆ 2021-09-17 ┆ 3 │
│ sec1 ┆ 2021-08-31 ┆ 4 │
│ sec1 ┆ 2020-12-31 ┆ 5 │
│ sec2 ┆ 2021-09-15 ┆ 7 │
│ sec2 ┆ 2021-08-31 ┆ 8 │
│ sec2 ┆ 2020-12-30 ┆ 9 │
└────────┴────────────┴───────┘
I was thinking of filtering df1
by symbol
and then do a join operation for every individual date
column of df2
. I would then subsequently concatenate the resulting dataframes. However, there's probably a much more elegant solution.
df1.join(
df2.unpivot(index='symbol', value_name='date').drop('variable'),
on=['symbol', 'date'],
how='inner',
)
Output:
┌────────┬────────────┬───────┐
│ symbol ┆ date ┆ price │
│ --- ┆ --- ┆ --- │
│ str ┆ date ┆ i64 │
╞════════╪════════════╪═══════╡
│ sec1 ┆ 2021-09-17 ┆ 3 │
│ sec1 ┆ 2021-08-31 ┆ 4 │
│ sec1 ┆ 2020-12-31 ┆ 5 │
│ sec2 ┆ 2021-09-15 ┆ 7 │
│ sec2 ┆ 2021-08-31 ┆ 8 │
└────────┴────────────┴───────┘