I have a pandas dataframe which includes a timestamp and 71 other columns, something like this:
timestamp |close_price|highest_price|volume| ...
2018-09-29 00:00:20 |1809 |1811 | ... |
2018-09-29 00:00:34 |1823 |1832 |
2018-09-29 00:00:59 |1832 |1863 |
2018-09-29 00:01:09 |1800 |1802 |
2018-09-29 00:01:28 |1832 |1845 |
.
.
.
I want to put the data into 10 min intervals and I want to do separate operations on each column, for example I want the 10 min intervals of close_price
column to show the last
value of the corresponding range in the real table, or for the highest_price
column, I want the max
value of the corresponding range, or for volume
I want the mean
of the values in that range. I already tried
dataTable = datefram.resample("10min").agg({'first_price':'first',
'close_price':'last',
'highest_price': 'max',
'volume':'mean',
#other attributes...
})
but the result seems to be incorrect. Is there any other ways to do what I want to do? I will appreciate any comments or thoughts.
Note that there is no specific pattern in timestamp values. In 1 minute, we can have 0 to 60 rows.
Your approach is correct. The
dataframe.resample("10min").agg()
does the calculations for you.
You might get more outputs than what you expect and that is because of this: resample
method continuously adds 10 minutes to the time and does the calculations that you asked. But if there was no data in any of the 10 min
intervals, it creates a NULL
row. Maybe your data is not continuous and causes this Null
rows.
You can simply delete the NULL
rows by using dataframe.dropna()