Search code examples
pythonpandasdataframedatetimemanipulate

How to manipulate values in a dataframe column at specific times for each day within the dataframe


So, I have a datetime indexed dataframe that looks like this:

eventTime   Energy  Power   RunningHours
9/29/2018 0:00  146.985 65  2256.88
9/29/2018 1:00  147.05  64.5    2257.87
9/29/2018 2:00  147.116 65  2258.87
9/29/2018 3:00  147.181 65  2259.87
9/29/2018 4:00  147.246 65  2260.87
9/29/2018 5:00  147.312 65  2261.87
9/29/2018 5:11  76.428      
9/29/2018 5:12      65  
9/29/2018 6:00  147.377 65  2262.87
9/29/2018 7:00  147.443 65  2263.87
9/29/2018 8:00  147.45      2263.98
9/29/2018 9:17  76.558      
9/29/2018 9:17          1174.35
9/29/2018 19:00 147.502 65  2264.75
9/29/2018 20:00 147.567 65  2265.75
9/29/2018 21:00 147.633 65  2266.75
9/29/2018 22:00 147.698 65  2267.75
9/29/2018 23:00 147.764 65  2268.75
9/30/2018 0:00  147.829 65  2269.75
9/30/2018 1:00  147.895 65  2270.75
9/30/2018 2:00  147.961 65  2271.75
9/30/2018 3:00  148.026 65  2272.73
9/30/2018 4:00  148.092 65  2273.73
9/30/2018 5:00  148.157 65  2274.73
9/30/2018 6:00  148.223 65  2275.73
9/30/2018 7:00  148.288 65  2276.73
9/30/2018 8:00  148.297     2276.87
9/30/2018 13:51     64  
9/30/2018 19:00 148.35  65  2277.68
9/30/2018 20:00 148.415 65  2278.67
9/30/2018 21:00 148.481 65  2279.67
9/30/2018 22:00 148.546 65  2280.67
9/30/2018 23:00 148.611 65  2281.67

For each day in the datetime index, I am looking to find the difference between "RunningHours" value at 23 hours and 0 hours.

I am imagining my output to look like

9/29/2018   11.87
9/30/2018   11.92

How do I get to this. I am currently disaggregating the datetime index to date and time, then looping down date and time to find the difference. Seems complicated for something very simple and I am sure there is an easier way using the datetime index as is. I just don't know how. Help please.

@ansev Your code works very well for data that is continuous and where the information exists for 00: and 23:00 timestamps. However, if data is missing for these 2 timestamps, the script picks up the first available or the last available datapoint for this date.

For. e.g.: For the data below

6/7/2018 0:00   67.728  64  1037.82
6/7/2018 1:00   67.793  64  1038.82
6/7/2018 2:00   67.857  64  1039.82
6/7/2018 3:00   67.922  64  1040.82
6/7/2018 4:00   67.987  64  1041.82
6/7/2018 5:00           64  1042.82
6/7/2018 6:00               1043.43
6/7/2018 23:00  68.288      

The output from the script is

6/7/2018    1037.82 1043.43 5.61

How do I modify it to say NaN if data is not available ? Thanks so much for your help on this.


Solution

  • Find values from a column in a DF at very specific times for every unique date

    I answered my own question here for those that are looking for something different.