I can take an average of two columns, and return it as a new column, but I can't figure out how to do it for all the columns in the specific convention that raw data has. The goal is to average A1 & D1, ..., A12 & D12 etc..
I tried different iterations of what's below but it doesn't work if I don't specify exact columns...
import pandas as pd
df = pd.DataFrame({'Time': [0.1, 0.2, 0.3],
'A1': [1000, 2000, 3000],
'A12': [1000, 2000, 3000],
'B1': [4000, 5000, 6000],
'B12': [4000, 5000, 6000],
'D1': [40000, 50000, 42000],
'D12': [40000, 50000, 42000]})
for column in d.columns:
if column.startswith('A') and column.startswith('D'):
d= (d[['A1', 'D1']].iloc[:].mean(axis=1))
column.startswith('A') and column.startswith('D')
will never be true.A
and D
.[1:]
begins the slice after the first character (e.g. A1[1:]
is 1
)pandas.Series.str.contains
to find only the columns of interest, A
and D
, in this case.
.contains
is used, because it recognizes a regex pattern, and .startswith
does not..groupby
only on the desired columns
df[cols].groupby(df[cols].columns.str[1:], axis=1)
# select columns with A or D
cols = df.columns[df.columns.str.contains('A|D')]
dfm = df.join(df[cols].groupby(df[cols].columns.str[1:], axis=1).mean().astype(float).add_prefix('AD_mean_'))
Time A1 A12 B1 B12 D1 D12 AD_mean_1 AD_mean_12
0 0.1 1000 1000 4000 4000 40000 40000 20500.0 20500.0
1 0.2 2000 2000 5000 5000 50000 50000 26000.0 26000.0
2 0.3 3000 3000 6000 6000 42000 42000 22500.0 22500.0