I have a dataframe with stock price history of companies by date.
print(prices)
shape: (5, 4)
┌──────┬─────┬───────┬──────────┐
│ date ┆ ibm ┆ apple ┆ coinbase │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 │
╞══════╪═════╪═══════╪══════════╡
│ 1 ┆ 50 ┆ 100 ┆ 40 │
│ 2 ┆ 48 ┆ 200 ┆ 60 │
│ 3 ┆ 49 ┆ 400 ┆ 10 │
│ 4 ┆ 27 ┆ 800 ┆ 300 │
│ 5 ┆ 11 ┆ 1600 ┆ 7000 │
└──────┴─────┴───────┴──────────┘
And another with prices shifted by 2 date units.
prices_2 = prices.shift(-2)
shape: (5, 4)
┌──────┬──────┬───────┬──────────┐
│ date ┆ ibm ┆ apple ┆ coinbase │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 │
╞══════╪══════╪═══════╪══════════╡
│ 3 ┆ 49 ┆ 400 ┆ 10 │
│ 4 ┆ 27 ┆ 800 ┆ 300 │
│ 5 ┆ 11 ┆ 1600 ┆ 7000 │
│ null ┆ null ┆ null ┆ null │
│ null ┆ null ┆ null ┆ null │
└──────┴──────┴───────┴──────────┘
I want to bundle the 2 ibm
series into struct
s in a single ibm
column, with a field for the price @ date, and the shift(-2) price. Similar for apple
etc.
How can I do this? Would it be efficient?
Don't know where to start, as it seems the resulting ibm
column would not consist of 2 series, but a single series of pairs of (original, shifted) prices
pl.struct
takes any number of expressions, allowing for putting shift
expressions directly into it:
df.with_columns(
pl.struct(col, pl.col(col).shift(-2).suffix('2'))
for col in df.columns
if col != 'date'
)
shape: (5, 4)
┌──────┬───────────┬─────────────┬─────────────┐
│ date ┆ ibm ┆ apple ┆ coinbase │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ struct[2] ┆ struct[2] ┆ struct[2] │
╞══════╪═══════════╪═════════════╪═════════════╡
│ 1 ┆ {50,49} ┆ {100,400} ┆ {40,10} │
│ 2 ┆ {48,27} ┆ {200,800} ┆ {60,300} │
│ 3 ┆ {49,11} ┆ {400,1600} ┆ {10,7000} │
│ 4 ┆ {27,null} ┆ {800,null} ┆ {300,null} │
│ 5 ┆ {11,null} ┆ {1600,null} ┆ {7000,null} │
└──────┴───────────┴─────────────┴─────────────┘
Here the underlying fields would be ibm
and ibm2
for the ibm
column (see suffix
), etc., you can use alias
or the like to rename the columns/fields whatever you want.
df.with_columns(...).schema
{'date': Int64, 'ibm': Struct([Field('ibm', Int64), Field('ibm2', Int64)]), 'apple': Struct([Field('apple', Int64), Field('apple2', Int64)]), 'coinbase': Struct([Field('coinbase', Int64), Field('coinbase2', Int64)])}