Let's analyse this sample code where zip() is used to create different windows from a dataset and return them in loop.
months = [Jan, Feb, Mar, Apr, May]
for x, y in zip(months, months[1:]):
print(x, y)
# Output of each window will be:
Jan Feb
Feb Mar
Mar Apr
Apr May
Let's suppose that now I want to calculate the respective length percentage between the months used in each window.
Example in steps:
Any suggestions on how I might implement this idea in the for loop are welcome!
Thank you!
EDIT
months = [Jan, Feb, Mar, Apr, May]
for x, y in zip(months, months[2:]):
print(x, y)
# Output of each window will be:
Jan Feb March
Feb Mar Apr
Mar Apr May
The goal is to calculate the length of two months on each window over the full window length:
We can now calculate one month over the size of each window (with start.month). However, how do we adapt this to include more than one month?
Also, instead of using days_in_month, would there be a way to use the length of the datapoints (rows) in each month?
By using length of datapoints (rows) I mean that each month has many datapoints in 'time' format (e.g., 60 mins format). This would imply that 1 day in a month would have 24 different datapoints (rows). Example:
column
rows
01-Jan-2010 T00:00 value
01-Jan-2010 T01:00 value
01-Jan-2010 T02:00 value
... ...
01-Jan-2010 T24:00 value
02-Jan-2010 T00:00 value
... ...
Thank you!
Here is one way. (In my case, months
is a period_range
object.)
import pandas as pd
months = pd.period_range(start='2020-01', periods=5, freq='M')
Now, iterate over range. Each iteration is a two-month window.
# print header labels
print('{:10s} {:10s} {:>10s} {:>10s} {:>10s} {:>10s} '.format(
'start', 'end', 'month', 'front (d)', 'total (d)', 'frac'))
for start, end in zip(months, months[1:]):
front_month = start.month
# number of days in first month (e.g., Jan)
front_month_days = start.days_in_month
# number of days in current sliding window (e.g., Jan + Feb)
days_in_curr_window = (end.end_time - start.start_time).days
frac = front_month_days / days_in_curr_window
print('{:10s} {:10s} {:10d} {:10d} {:10d} {:10.3f}'.format(
str(start), str(end), front_month,
front_month_days, days_in_curr_window, frac))
start end month front (d) total (d) frac
2020-01 2020-02 1 31 60 0.517
2020-02 2020-03 2 29 60 0.483
2020-03 2020-04 3 31 61 0.508
2020-04 2020-05 4 30 61 0.492