I would like to have a more systematic way to aggregate the frequencies for multiple frequency intervals.
The following dataframe contains random data representing timefrequency data. Its columns index contains the following levels:
The code to generate the dataframe is as follows:
import numpy as np
import pandas as pd
pidx = pd.IndexSlice
D=np.zeros((32,2,2,6))# timepoints, conditions, channels, frequencies
for i in range(6):
D[:,0,0,i]=np.arange(i,i+32,1) # C0, ch01
D[:,0,1,i]=np.arange(i+1,i+32+1,1) # C0, ch02
D[:,1,0,i]=np.arange(i+2,i+32+2,1) # C1, ch01
D[:,1,1,i]=np.arange(i+3,i+32+3,1) # C1, ch02
conditions = ['C0', 'C1']
channels = ["ch{:02}".format(i) for i in np.arange(1,3)]
frequencies = np.arange(1, 7)
# columnns multi index
cidx = pd.MultiIndex.from_product([conditions,channels,frequencies])
# reshape to 2D
D = D.reshape((D.shape[0], -1))
# create DataFrame
df = pd.DataFrame(D, columns=cidx)
Currently I do the following
fbands = {
'fb1' : [pidx[1:3]],
'fb2' : [pidx[2:5]],
'fb3' : [pidx[4:6]]
}
def frequencyband_mean(df, fb):
return df.loc(axis=1)[:,:,fb].groupby(axis=1,level=[0,1]).mean()
dffbands = dict((k, frequencyband_mean(df, fbands[k])) for k in fbands)
df_result = pd.concat(dffbands, axis=1)
However, with the latter code the columnindex levels are not being maintained, more specifically, the first level of df_result
contains the name of every frequency interval defined in fbands
. I would solve this by swapping the column levels, but that seems cumbersome.
I would like to know whether there's a more systematic way to apply an aggregation function to multiple frequency intervals in one go, while maintaining the column index levels. Eventually the last level of the columnindex should look like
fb1
, fb2
, fb3
)If I got you correct, then I'd do it like this:
fbands={
'fb1' : [0,3],
'fb2' : [2,5],
'fb3' : [4,6]
}
for co_i in df.columns.levels[0]:
for cha_i in df.columns.levels[1]:
for k,v in fbands.items():
df[co_i,cha_i,k] = df[co_i,cha_i,].T[v[0]:v[1]].mean()
Update: Note that the slice here is not based on the labels, hence you would actually need v[0]-1:v[1]
; to make this more clear, I'd suggest you simplify your df
:
D=np.zeros((32,2,2,6))
for i in range(6):
D[:,0,0,i]=np.arange(i,i+32,1) # C0, ch01
D[:,0,1,i]=np.arange(i+1,i+32+1,1) # C0, ch02
D[:,1,0,i]=np.arange(i+2,i+32+2,1) # C1, ch01
D[:,1,1,i]=np.arange(i+3,i+32+3,1) # C1, ch02
such that df.head(3)
returns:
C0 C1
ch01 ch02 ch01 ch02
1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6
0 0.0 1.0 2.0 3.0 4.0 5.0 1.0 2.0 3.0 4.0 5.0 6.0 2.0 3.0 4.0 5.0 6.0 7.0 3.0 4.0 5.0 6.0 7.0 8.0
1 1.0 2.0 3.0 4.0 5.0 6.0 2.0 3.0 4.0 5.0 6.0 7.0 3.0 4.0 5.0 6.0 7.0 8.0 4.0 5.0 6.0 7.0 8.0 9.0
2 2.0 3.0 4.0 5.0 6.0 7.0 3.0 4.0 5.0 6.0 7.0 8.0 4.0 5.0 6.0 7.0 8.0 9.0 5.0 6.0 7.0 8.0 9.0 10.0
This way, we can actually verify our expectations! I am now using fbands
as an array, rather than a dict, so that the ordering becomes nice (could have also used an OrderedDict
from collections
).
fbands=[
['fb1',[1,3]],
['fb2',[2,5]],
['fb3',[4,6]]
]
for co_i in df.columns.levels[0]:
for cha_i in df.columns.levels[1]:
for fi in range(len(fbands)):
k=fbands[fi][0]
v=fbands[fi][1]
df[co_i,cha_i,k] = df[co_i,cha_i,].T[v[0]-1:v[1]].mean()
for i in range(7):
df=df.drop(i, axis=1, level=2)
print(df.head(3))
returns:
C0 C1
ch01 ch02 ch01 ch02
fb1 fb2 fb3 fb1 fb2 fb3 fb1 fb2 fb3 fb1 fb2 fb3
0 1.0 2.5 4.0 2.0 3.5 5.0 3.0 4.5 6.0 4.0 5.5 7.0
1 2.0 3.5 5.0 3.0 4.5 6.0 4.0 5.5 7.0 5.0 6.5 8.0
2 3.0 4.5 6.0 4.0 5.5 7.0 5.0 6.5 8.0 6.0 7.5 9.0
Now, the fb*
columns actually reflect the mean of frequencies fb1:[1,2,3], fb2:[2,3,4,5] and fb3:[4,5,6], as I hope you intended
Update 2: Note that if you would set up your frequencies like this instead:
frequencies = ["f{0}".format(i) for i in np.arange(1,7)]
then you could e.g. create the means of frequencies 'f1','f2','f3'
in ch01
within C0
like this:
df['C0','ch01','fb1'] = df.loc(axis=1)[pd.IndexSlice['C0','ch01',['f1','f2','f3'],:]].mean(axis=1)