Search code examples
pythonpandas

How to subtract data between columns that have same subfix


I have a sample dataframe as below that has same subfix as 001, 002, 003.

import pandas as pd
import numpy as np

branch_names = [f"Branch_{i}" for i in range(1, 11)]
date_1 = '20241231'
date_2 = '20250214'
date_3 = '20250220'

data = {
    'Branch': branch_names,
    date_1 + '_001': np.random.randint(60, 90, 10),
    date_1 + '_002': np.random.randint(60, 90, 10),
    date_1 + '_003': np.random.randint(60, 90, 10),
    date_2 + '_001': np.random.randint(60, 90, 10),
    date_2 + '_002': np.random.randint(60, 90, 10),
    date_2 + '_003': np.random.randint(60, 90, 10),
    date_3 + '_001': np.random.randint(60, 90, 10),
    date_3 + '_002': np.random.randint(60, 90, 10),
    date_3 + '_003': np.random.randint(60, 90, 10)
}

# Chuyển thành DataFrame
df = pd.DataFrame(data)

enter image description here

Now I want to subtract data between columns that have same subfix as below principle:

df['diff_1_001'] = df[date_3 + '_001'] - df[date_2 + '_001']
df['diff_2_001'] = df[date_3 + '_001'] - df[date_1 + '_001']
df['diff_1_002'] = df[date_3 + '_002'] - df[date_2 + '_002']
df['diff_2_002'] = df[date_3 + '_002'] - df[date_1 + '_002']
df['diff_1_003'] = df[date_3 + '_003'] - df[date_2 + '_003']
df['diff_2_003'] = df[date_3 + '_003'] - df[date_1 + '_003']
df

As you see that we have same 001, 002, 003 but prefix is different. So I want don't want to hard code the 001, 002, 003 but automatically subtract it as mentioned above.


Solution

  • You would typically use a MultiIndex here, which makes operations much easier than relying on substrings:

    # set "Branch" as index, convert columns to MultiIndex
    df2 = df.set_index('Branch')
    df2.columns = df2.columns.str.split('_', expand=True).rename(['date', 'id'])
    
    # perform the operation and join
    out = df2.join(df2[date_3].sub(df2.drop(columns=[date_3]))
                   .rename(lambda x: f'diff_{x}', level=0, axis=1)
                  )
    

    Output:

    date      20241231         20250214         20250220         diff_20241231         diff_20250214        
    id             001 002 003      001 002 003      001 002 003           001 002 003           001 002 003
    Branch                                                                                                  
    Branch_1        82  72  68       62  89  86       89  64  77             7  -8   9            27 -25  -9
    Branch_2        72  66  80       87  63  78       81  60  76             9  -6  -4            -6  -3  -2
    Branch_3        84  63  70       79  63  72       61  71  63           -23   8  -7           -18   8  -9
    Branch_4        89  82  82       85  67  63       72  62  84           -17 -20   2           -13  -5  21
    Branch_5        84  89  71       83  69  69       62  65  87           -22 -24  16           -21  -4  18
    Branch_6        63  65  65       81  69  70       62  81  68            -1  16   3           -19  12  -2
    Branch_7        78  83  89       79  69  87       84  76  80             6  -7  -9             5   7  -7
    Branch_8        75  71  88       74  83  73       61  68  64           -14  -3 -24           -13 -15  -9
    Branch_9        63  60  75       80  63  67       65  89  76             2  29   1           -15  26   9
    Branch_10       70  71  68       81  74  67       68  61  85            -2 -10  17           -13 -13  18
    

    If needed you can always come back to a flat index later:

    out.columns = out.columns.map('_'.join)
    out.reset_index(inplace=True)
    

    Output:

          Branch  20241231_001  20241231_002  20241231_003  20250214_001  20250214_002  20250214_003  20250220_001  20250220_002  20250220_003  diff_20241231_001  diff_20241231_002  diff_20241231_003  diff_20250214_001  diff_20250214_002  diff_20250214_003
    0   Branch_1            63            89            62            67            69            86            68            67            88                  5                -22                 26                  1                 -2                  2
    1   Branch_2            67            80            75            78            85            60            84            83            64                 17                  3                -11                  6                 -2                  4
    2   Branch_3            88            89            87            88            78            82            87            73            85                 -1                -16                 -2                 -1                 -5                  3
    3   Branch_4            63            62            81            71            60            76            89            86            60                 26                 24                -21                 18                 26                -16
    4   Branch_5            78            65            67            79            87            70            87            77            70                  9                 12                  3                  8                -10                  0
    5   Branch_6            89            65            67            77            69            64            74            84            74                -15                 19                  7                 -3                 15                 10
    6   Branch_7            77            72            71            69            88            84            83            80            82                  6                  8                 11                 14                 -8                 -2
    7   Branch_8            61            72            82            89            71            80            60            83            88                 -1                 11                  6                -29                 12                  8
    8   Branch_9            78            81            77            74            77            63            79            60            80                  1                -21                  3                  5                -17                 17
    9  Branch_10            77            89            66            81            69            79            68            71            78                 -9                -18                 12                -13                  2                 -1