Search code examples
pythonpandasvectorizationdatetimeindex

How to vectorize Pandas DateTimeIndex


I'm trying to avoid a for loop with DateTimeIndex. I have a function get_latest that looks up the most recent wage index value. When I step through the dates of pay days, the lookup works fine. When I attempt to vectorize the operation, I'm advised TypeError: 'numpy.ndarray' object is not callable. I've tried all manners of dt, date, to_pydatetime, etc. to no avail.

# payroll.py
# %%
import pandas as pd
import datetime

# %%
def get_latest(date, series):
    return series.loc[max([x for x in series.index if x <= date])]

# %%
start_rate = 1000.0
start = '2025-08-01'
end = '2026-02-01'

raise_series = pd.Series ({
    datetime.date(2024, 10, 1) : 0.06,
    datetime.date(2025, 10, 1) : 0.05,
    datetime.date(2026, 1, 1) : 1.055,
    datetime.date(2026, 10, 1) : 0.04,
    datetime.date(2027, 10, 1) : 0.04,
    datetime.date(2028, 10, 1) : 0.04,
    datetime.date(2029, 10, 1) : 0.04
})

# %%
initial_index = pd.Series ({
    datetime.date(1, 1, 1): 0.00
})

index_series = pd.concat([initial_index, raise_series], axis=0)
index_series += 1
index_series = index_series.cumprod()

# %%
pay_days = pd.date_range(start=start, end=end, freq='2W')

# %%
gross = []
for row in pay_days:
    gross.append(get_latest(row.date(), index_series) * start_rate) 
pay_days_gross1 = pd.Series(gross)
# %%
pay_days_gross2 = get_latest(pay_days.date(), index_series) * start_rate
# %%

Solution

  • Here's one approach:

    pay_days_gross = (raise_series.add(1)
                      .cumprod()
                      .reindex(pay_days, 
                               method='ffill', 
                               fill_value=1)
                      .mul(start_rate)
                      )
    

    Output:

    2025-08-03    1060.000
    2025-08-17    1060.000
    2025-08-31    1060.000
    2025-09-14    1060.000
    2025-09-28    1060.000
    2025-10-12    1113.000
    2025-10-26    1113.000
    2025-11-09    1113.000
    2025-11-23    1113.000
    2025-12-07    1113.000
    2025-12-21    1113.000
    2026-01-04    2287.215
    2026-01-18    2287.215
    2026-02-01    2287.215
    Freq: 2W-SUN, dtype: float64
    

    Equality check OP's method:

    pay_days_gross1.set_axis(pay_days_gross.index).equals(pay_days_gross)
    
    # True
    

    Explanation / Intermediates

    • Use Series.add to add 1, turning percentages into multipliers, and Series.cumprod to calculate their cumulative effect over time.
    raise_series.add(1).cumprod()
    
    2024-10-01    1.060000
    2025-10-01    1.113000
    2026-01-01    2.287215
    2026-10-01    2.378704
    2027-10-01    2.473852
    2028-10-01    2.572806
    2029-10-01    2.675718
    dtype: float64
    
    • Use Series.reindex with pay_days, using method='ffill' to forward-fill values and fill_value=1 to set the multiplier to 1 if start precedes the first date in raise_series.
    • Multiply by start_rate with Series.mul.