I was able to find a couple of similar questions to this one, but I don't believe they answer what I am trying to accomplish.
Quite simply I am trying to use rolling.min with a variable window length from another column in the df. My dataset will probably get pretty large in the future so I am trying to pull this off without having to use a loop
pvdata['templow'] = pvdata['low'].rolling(window = pvdata['counter']).min()
Trying to run this I get: ValueError: window must be an integer 0 or greater, I believe the issue is that I am passing a series instead of an integer for the window length so my next attempt was to use apply lambda:
pvdata['templow'] = pvdata['counter'].apply(lambda x: pvdata['low'].rolling(window = x).min())
This returned the error: ValueError: Expected a 1D array, got an array with shape (10989, 10989), which I cannot explain.
Here is an example of the output I am looking for:
Is this possible to do without a loop?
You cannot call rolling
with a variable window size. Depend on how many unique window sizes there are in the counter
column, it maybe faster to call rolling
with each size and combine the results.
n = 1_000_000
m = 10 # The number of unique values in `counter`
np.random.seed(42)
low = np.random.randint(1, 100, n)
counter = np.random.choice(np.arange(1, m + 1), n)
pvdata = pd.DataFrame({"low": low, "counter": counter})
codes, uniques = pd.factorize(pvdata["counter"], sort=True)
arr = np.vstack([pvdata["low"].rolling(u, min_periods=1).min() for u in uniques]).T
templow = arr[range(len(pvdata)), codes]
n = len(pvdata)
low = pvdata["low"].to_numpy()
counter = pvdata["counter"].to_numpy()
templow = np.repeat(np.nan, n)
for i, c in zip(range(n), counter):
lb = max(0, i - c + 1)
ub = i + 1
templow[i] = low[lb:ub].min()
As m
increases, the fast method will use more CPU and memory because:
rolling
more.rolling
calls. arr
requires n x m x 8
bytes of memory.The "fast" method becomes slower as m
increases. During my test, both methods are about equal at m = 150
, after which "fast" becomes the slower one. Experiment with your data and pick an algorithm that suits your needs.