python pandas for-loop zip sliding-window

Sliding windows - measuring length of observations on each looped window

Let's analyse this sample code where zip() is used to create different windows from a dataset and return them in loop.

months = [Jan, Feb, Mar, Apr, May]

for x, y in zip(months, months[1:]):
    print(x, y)

# Output of each window will be:
Jan Feb 
Feb Mar
Mar Apr
Apr May

Let's suppose that now I want to calculate the respective length percentage between the months used in each window.

Example in steps:

When returning the first window (Jan Feb), I want to calculate the % length of Jan over the full window (which equals to Jan + Feb) and return it a new variable
When returning the second window (Feb Mar), I want to calculate the % length of Feb over the full window (which equals to Feb + Mar) and return it a new variable
Continuing this process until last window

Any suggestions on how I might implement this idea in the for loop are welcome!

Thank you!

EDIT

months = [Jan, Feb, Mar, Apr, May]

for x, y in zip(months, months[2:]):
    print(x, y)

# Output of each window will be:
Jan Feb March
Feb Mar Apr
Mar Apr May

The goal is to calculate the length of two months on each window over the full window length:

1st window: Jan + Feb / Jan + Feb + March
2nd window: Feb + Mar / Feb + Mar + Apr
continuing to last window

We can now calculate one month over the size of each window (with start.month). However, how do we adapt this to include more than one month?

Also, instead of using days_in_month, would there be a way to use the length of the datapoints (rows) in each month?

By using length of datapoints (rows) I mean that each month has many datapoints in 'time' format (e.g., 60 mins format). This would imply that 1 day in a month would have 24 different datapoints (rows). Example:

                         column
rows             
01-Jan-2010 T00:00        value
01-Jan-2010 T01:00        value
01-Jan-2010 T02:00        value
...                       ...
01-Jan-2010 T24:00        value
02-Jan-2010 T00:00        value
...                       ...

Thank you!

Solution

Here is one way. (In my case, months is a period_range object.)

import pandas as pd
months = pd.period_range(start='2020-01', periods=5, freq='M')

Now, iterate over range. Each iteration is a two-month window.

# print header labels
print('{:10s} {:10s} {:>10s} {:>10s} {:>10s} {:>10s} '.format(
    'start', 'end', 'month', 'front (d)', 'total (d)', 'frac'))

for start, end in zip(months, months[1:]):
    front_month = start.month

    # number of days in first month (e.g., Jan)
    front_month_days = start.days_in_month

    # number of days in current sliding window (e.g., Jan + Feb)
    days_in_curr_window = (end.end_time - start.start_time).days

    frac = front_month_days / days_in_curr_window

    print('{:10s} {:10s} {:10d} {:10d} {:10d} {:10.3f}'.format(
        str(start), str(end), front_month,
        front_month_days, days_in_curr_window, frac))


start      end             month  front (d)  total (d)       frac 
2020-01    2020-02             1         31         60      0.517
2020-02    2020-03             2         29         60      0.483
2020-03    2020-04             3         31         61      0.508
2020-04    2020-05             4         30         61      0.492