Search code examples
pythondataframepython-polars

not able to apply custom function in polars


I have a custom function previously working pretty fine in Pandas, but somehow it won't work in the polars map_elements() function when I am trying to migrate to polars.

below is my code.

df = df.sort("time", descending=False)
latest_day = df[-1,'time']
def fetch_price(self, date): # the custom function
    if date <= latest_day:
        return df[date,self.name]
    else:
        return None
def get_rate(self, direction, margin, test_cycle):
    # retrieve relevant data from df
    self.df = (df[['time', self.name]].drop_nulls())[
              test_cycle * -1:]  # test_cycle*-1. test_cycle大于life_cycle。
    # apply next_trade_day to 'time' column, and create a new column 'Maturity_Date'
    self.df = self.df.with_columns(
        (pl.col('time').map_elements(lambda x: next_trade_day(x, self.life_cycle))).alias('Maturity_Date'))
    # check if the generated dates are in the original time series
    self.df = self.df.with_columns(pl.col('Maturity_Date').is_in(df['time']).cast(bool).alias('Data_Accessibility'))
    # maturity price on the day of maturity
    # apply fetch_price to 'Maturity_Date' column if 'Data_Accessibility' is True, and create a new column 'Maturity_Price'
    # self.df = self.df.with_columns((pl.col('Maturity_Date').map_elements(lambda x: self.fetch_price(x))).alias('Maturity_Price'))
    self.df = self.df.with_columns(pl.when(pl.col('Maturity_Date') <= pl.col('time').last()).then(pl.col(self.name)).alias('Maturity_Price'))
    print(self.df)

CU = future('CU8888.SHF', 33)
CU.get_rate(1, 0.05, 8000)

This is the output of print (self.df) :

┌────────────┬────────────┬───────────────┬────────────────────┐
│ time       ┆ CU8888.SHF ┆ Maturity_Date ┆ Data_Accessibility │
│ ---        ┆ ---        ┆ ---           ┆ ---                │
│ date       ┆ f64        ┆ date          ┆ bool               │
╞════════════╪════════════╪═══════════════╪════════════════════╡
│ 2004-01-02 ┆ 23275.0    ┆ 2004-02-04    ┆ true               │
│ 2004-01-02 ┆ 23275.0    ┆ 2004-02-04    ┆ true               │
│ 2004-01-02 ┆ 23275.0    ┆ 2004-02-04    ┆ true               │
│ 2004-01-02 ┆ 23275.0    ┆ 2004-02-04    ┆ true               │
│ …          ┆ …          ┆ …             ┆ …                  │
│ 2023-03-23 ┆ 68274.0    ┆ 2023-04-25    ┆ false              │
│ 2023-03-24 ┆ 69670.0    ┆ 2023-04-26    ┆ false              │
│ 2023-03-27 ┆ 69161.0    ┆ 2023-05-04    ┆ false              │
│ 2023-03-28 ┆ 69244.0    ┆ 2023-05-04    ┆ false              │
└────────────┴────────────┴───────────────┴────────────────────┘
    

Solution

  • The main issue here was you had a datetime index in pandas and could use df.loc[date, ...] to do a "row lookup".

    There is no .loc or indexes in polars, however, the .apply + .loc approach being used is essentially mimicking the functionality of a left join.

    df = pl.from_repr("""
    ┌────────────┬────────────┬───────────────┬────────────────────┐
    │ time       ┆ CU8888.SHF ┆ Maturity_Date ┆ Data_Accessibility │
    │ ---        ┆ ---        ┆ ---           ┆ ---                │
    │ date       ┆ f64        ┆ date          ┆ bool               │
    ╞════════════╪════════════╪═══════════════╪════════════════════╡
    │ 2004-01-02 ┆ 23275.0    ┆ 2004-02-04    ┆ true               │
    │ 2004-01-02 ┆ 23275.0    ┆ 2004-02-04    ┆ true               │
    │ 2004-01-02 ┆ 23275.0    ┆ 2004-02-04    ┆ true               │
    │ 2004-01-02 ┆ 23275.0    ┆ 2004-02-04    ┆ true               │
    │ 2004-02-04 ┆ 99999.0    ┆ 2004-05-10    ┆ false              │
    │ 2023-03-23 ┆ 68274.0    ┆ 2023-04-25    ┆ false              │
    │ 2023-03-24 ┆ 69670.0    ┆ 2023-04-26    ┆ false              │
    │ 2023-03-27 ┆ 69161.0    ┆ 2023-05-04    ┆ false              │
    │ 2023-03-28 ┆ 69244.0    ┆ 2023-05-04    ┆ false              │
    └────────────┴────────────┴───────────────┴────────────────────┘
    """)
    
    df.join(
       df.select("time", Maturity_Price = "CU8888.SHF"), 
       left_on="Maturity_Date", 
       right_on="time", 
       how="left"
    )
    
    shape: (9, 5)
    ┌────────────┬────────────┬───────────────┬────────────────────┬────────────────┐
    │ time       ┆ CU8888.SHF ┆ Maturity_Date ┆ Data_Accessibility ┆ Maturity_Price │
    │ ---        ┆ ---        ┆ ---           ┆ ---                ┆ ---            │
    │ date       ┆ f64        ┆ date          ┆ bool               ┆ f64            │
    ╞════════════╪════════════╪═══════════════╪════════════════════╪════════════════╡
    │ 2004-01-02 ┆ 23275.0    ┆ 2004-02-04    ┆ true               ┆ 99999.0        │
    │ 2004-01-02 ┆ 23275.0    ┆ 2004-02-04    ┆ true               ┆ 99999.0        │
    │ 2004-01-02 ┆ 23275.0    ┆ 2004-02-04    ┆ true               ┆ 99999.0        │
    │ 2004-01-02 ┆ 23275.0    ┆ 2004-02-04    ┆ true               ┆ 99999.0        │
    │ …          ┆ …          ┆ …             ┆ …                  ┆ …              │
    │ 2023-03-23 ┆ 68274.0    ┆ 2023-04-25    ┆ false              ┆ null           │
    │ 2023-03-24 ┆ 69670.0    ┆ 2023-04-26    ┆ false              ┆ null           │
    │ 2023-03-27 ┆ 69161.0    ┆ 2023-05-04    ┆ false              ┆ null           │
    │ 2023-03-28 ┆ 69244.0    ┆ 2023-05-04    ┆ false              ┆ null           │
    └────────────┴────────────┴───────────────┴────────────────────┴────────────────┘