I have a sample dataframe as below that has same subfix as 001, 002, 003
.
import pandas as pd
import numpy as np
branch_names = [f"Branch_{i}" for i in range(1, 11)]
date_1 = '20241231'
date_2 = '20250214'
date_3 = '20250220'
data = {
'Branch': branch_names,
date_1 + '_001': np.random.randint(60, 90, 10),
date_1 + '_002': np.random.randint(60, 90, 10),
date_1 + '_003': np.random.randint(60, 90, 10),
date_2 + '_001': np.random.randint(60, 90, 10),
date_2 + '_002': np.random.randint(60, 90, 10),
date_2 + '_003': np.random.randint(60, 90, 10),
date_3 + '_001': np.random.randint(60, 90, 10),
date_3 + '_002': np.random.randint(60, 90, 10),
date_3 + '_003': np.random.randint(60, 90, 10)
}
# Chuyển thành DataFrame
df = pd.DataFrame(data)
Now I want to subtract data between columns that have same subfix as below principle:
df['diff_1_001'] = df[date_3 + '_001'] - df[date_2 + '_001']
df['diff_2_001'] = df[date_3 + '_001'] - df[date_1 + '_001']
df['diff_1_002'] = df[date_3 + '_002'] - df[date_2 + '_002']
df['diff_2_002'] = df[date_3 + '_002'] - df[date_1 + '_002']
df['diff_1_003'] = df[date_3 + '_003'] - df[date_2 + '_003']
df['diff_2_003'] = df[date_3 + '_003'] - df[date_1 + '_003']
df
As you see that we have same 001, 002, 003
but prefix is different. So I want don't want to hard code the 001, 002, 003
but automatically subtract it as mentioned above.
You would typically use a MultiIndex here, which makes operations much easier than relying on substrings:
# set "Branch" as index, convert columns to MultiIndex
df2 = df.set_index('Branch')
df2.columns = df2.columns.str.split('_', expand=True).rename(['date', 'id'])
# perform the operation and join
out = df2.join(df2[date_3].sub(df2.drop(columns=[date_3]))
.rename(lambda x: f'diff_{x}', level=0, axis=1)
)
Output:
date 20241231 20250214 20250220 diff_20241231 diff_20250214
id 001 002 003 001 002 003 001 002 003 001 002 003 001 002 003
Branch
Branch_1 82 72 68 62 89 86 89 64 77 7 -8 9 27 -25 -9
Branch_2 72 66 80 87 63 78 81 60 76 9 -6 -4 -6 -3 -2
Branch_3 84 63 70 79 63 72 61 71 63 -23 8 -7 -18 8 -9
Branch_4 89 82 82 85 67 63 72 62 84 -17 -20 2 -13 -5 21
Branch_5 84 89 71 83 69 69 62 65 87 -22 -24 16 -21 -4 18
Branch_6 63 65 65 81 69 70 62 81 68 -1 16 3 -19 12 -2
Branch_7 78 83 89 79 69 87 84 76 80 6 -7 -9 5 7 -7
Branch_8 75 71 88 74 83 73 61 68 64 -14 -3 -24 -13 -15 -9
Branch_9 63 60 75 80 63 67 65 89 76 2 29 1 -15 26 9
Branch_10 70 71 68 81 74 67 68 61 85 -2 -10 17 -13 -13 18
If needed you can always come back to a flat index later:
out.columns = out.columns.map('_'.join)
out.reset_index(inplace=True)
Output:
Branch 20241231_001 20241231_002 20241231_003 20250214_001 20250214_002 20250214_003 20250220_001 20250220_002 20250220_003 diff_20241231_001 diff_20241231_002 diff_20241231_003 diff_20250214_001 diff_20250214_002 diff_20250214_003
0 Branch_1 63 89 62 67 69 86 68 67 88 5 -22 26 1 -2 2
1 Branch_2 67 80 75 78 85 60 84 83 64 17 3 -11 6 -2 4
2 Branch_3 88 89 87 88 78 82 87 73 85 -1 -16 -2 -1 -5 3
3 Branch_4 63 62 81 71 60 76 89 86 60 26 24 -21 18 26 -16
4 Branch_5 78 65 67 79 87 70 87 77 70 9 12 3 8 -10 0
5 Branch_6 89 65 67 77 69 64 74 84 74 -15 19 7 -3 15 10
6 Branch_7 77 72 71 69 88 84 83 80 82 6 8 11 14 -8 -2
7 Branch_8 61 72 82 89 71 80 60 83 88 -1 11 6 -29 12 8
8 Branch_9 78 81 77 74 77 63 79 60 80 1 -21 3 5 -17 17
9 Branch_10 77 89 66 81 69 79 68 71 78 -9 -18 12 -13 2 -1