I have a Data frame as follows:
Date | Time | U | V |
---|---|---|---|
2023-01-03 | 10:00:21:08 | 5 | 8 |
2023-01-03 | 10:00:25:08 | 2 | 3 |
2023-01-03 | 10:00:27:08 | 3 | 3 |
2023-01-03 | 10:00:29:08 | 6 | 6 |
2023-01-03 | 10:00:31:08 | 4 | 6 |
2023-01-03 | 10:00:35:08 | 9 | 7 |
2023-01-03 | 10:00:37:08 | 10 | 4 |
2023-01-03 | 10:00:39:08 | 11 | 12 |
I am trying to average 2 values and printing the results for the averages of the Date, Time, U and V.
I have done this:
data_columns=['Date','Time','U','V']
df3= data.groupby(np.arange(len(data))//2).mean()
print(df3)
However, my numeric values average and print, but my Date and Time don't. I know thats because df.groupby function only works on the numeric values and the Date and Time types are objects. How do I get the date and time to print such that the initial time prints for the averaging value. Like so:
Date | Time |U | V|
2023-01-03 10:00:21:08 3.5 5.5
2023-01-03 10:00:27:08 4.5 4.5
2023-01-03 10:00:31:08 6.5 6.5
2023-01-03 10:00:37:08 10.5 8
` ** in reality its ...np.arrange(len(data)//300).mean()
IIUC, the groupby()
function uses with a column name or a list of column name like
groupby('Date')
or groupby(['Date','Time'])
UPDATE After discuss with @doctorstrange, I would like to update the answer. The code below try to create a dummy data:
import pandas as pd
from datetime import datetime
import time
import random
logfile='log.csv'
c=0
with open(logfile,'a') as handle:
handle.write('Date,Time,U,V\n')
while c<=100:
now=datetime.now()
D=now.strftime('%Y-%m-%d')
T=now.strftime('%H:%M:%S:%f')
u=random.randint(1, 20)
v=random.randint(1, 20)
handle.write(f'{D},{T},{u},{v}\n')
c=c+1
time.sleep(1)
The data
is the pandas dataframe, like this
Date | Time | U | V | |
---|---|---|---|---|
0 | 2023-01-13 | 11:31:43:481765 | 15 | 8 |
1 | 2023-01-13 | 11:31:44:485698 | 13 | 11 |
2 | 2023-01-13 | 11:31:45:489411 | 7 | 11 |
3 | 2023-01-13 | 11:31:46:492592 | 6 | 11 |
4 | 2023-01-13 | 11:31:47:496405 | 18 | 17 |
... | ... | ... | ... | ... |
96 | 2023-01-13 | 11:33:19:812832 | 10 | 5 |
97 | 2023-01-13 | 11:33:20:812896 | 5 | 15 |
98 | 2023-01-13 | 11:33:21:816767 | 16 | 9 |
99 | 2023-01-13 | 11:33:22:818553 | 6 | 8 |
100 | 2023-01-13 | 11:33:23:818651 | 13 | 6 |
The data was writen into logfile
, 'log.csv' file. Read into the dataframe. data
variable.
data=pd.read_csv(logfile)
Create a new column timestamp
by apply the pd.Timestamp.combine()
which combine Date
and Time
.
data['timestamp']=data.apply(lambda x: pd.Timestamp.combine(
datetime.strptime(x['Date'],'%Y-%m-%d')
,datetime.strptime(x['Time'],'%H:%M:%S:%f').time()
), axis=1)
You will get this,
Date | Time | U | V | timestamp | |
---|---|---|---|---|---|
0 | 2023-01-13 | 11:31:43:481765 | 15 | 8 | 2023-01-13 11:31:43.481765 |
1 | 2023-01-13 | 11:31:44:485698 | 13 | 11 | 2023-01-13 11:31:44.485698 |
2 | 2023-01-13 | 11:31:45:489411 | 7 | 11 | 2023-01-13 11:31:45.489411 |
3 | 2023-01-13 | 11:31:46:492592 | 6 | 11 | 2023-01-13 11:31:46.492592 |
4 | 2023-01-13 | 11:31:47:496405 | 18 | 17 | 2023-01-13 11:31:47.496405 |
... | ... | ... | ... | ... | ... |
96 | 2023-01-13 | 11:33:19:812832 | 10 | 5 | 2023-01-13 11:33:19.812832 |
97 | 2023-01-13 | 11:33:20:812896 | 5 | 15 | 2023-01-13 11:33:20.812896 |
98 | 2023-01-13 | 11:33:21:816767 | 16 | 9 | 2023-01-13 11:33:21.816767 |
99 | 2023-01-13 | 11:33:22:818553 | 6 | 8 | 2023-01-13 11:33:22.818553 |
100 | 2023-01-13 | 11:33:23:818651 | 13 | 6 | 2023-01-13 11:33:23.818651 |
Then, as you use df3
variable, the code below use resample()
function with rule='min'
(every minute) on the timestamp
column and set the offset by 0s
(zero second). Then calculate the mean()
and then reset the dataframe index, set the name of the column to U_mean
df3=data.resample(rule='min', on='timestamp', offset='0s').U.mean().reset_index(name='U_mean')
df3
The first result like this
timestamp | U_mean | |
---|---|---|
0 | 2023-01-13 11:31:00 | 9.823529 |
1 | 2023-01-13 11:32:00 | 10.783333 |
2 | 2023-01-13 11:33:00 | 8.250000 |
Lastly, for df4
for V
mean
df4=data.resample(rule='min', on='timestamp', offset='0s').V.mean().reset_index(name='V_mean')
df4
Then merge df3
and df4
on timestamp
you will get the final result
pd.merge(df3,df4, on='timestamp')
Finally, the result is
timestamp | U_mean | V_mean | |
---|---|---|---|
0 | 2023-01-13 11:31:00 | 9.823529 | 11.117647 |
1 | 2023-01-13 11:32:00 | 10.783333 | 10.716667 |
2 | 2023-01-13 11:33:00 | 8.250000 | 12.541667 |
Hope this help.