python pandas dataframe merge multi-index

Concatenate dataframes with initially mismatching index structures (columns)

I have one bigger dataframe and a small one which only has one row.

the bigger one

route            TC2_37               ...             TD25             
                 value    daily_change  ...         value    daily_change
period                                ...                              
Aug 23          20339.0       4018.0  ...          26569.0       -951.0
Sep 23          19737.0       3037.0  ...          32725.0       -507.0
Oct 23          19821.0       1316.0  ...          38033.0        -18.0
Nov 23          20803.0        580.0  ...          40282.0       -188.0
Dec 23          22070.0        115.0  ...          42195.0       -148.0
Q3 23           18158.0       1891.0  ...          31269.0      -1102.0
Q4 23           20899.0        672.0  ...          40170.0       -117.0
Q1 24           16361.0        363.0  ...          37983.0       -125.0
Q2 24           14581.0        380.0  ...          28731.0        546.0
Q3 24           13029.0        415.0  ...          27840.0        628.0
Q4 24           16701.0        310.0  ...          33390.0        520.0
Cal 24          15168.0        367.0  ...          31986.0        393.0
Cal 25          13950.0         98.0  ...          30712.0        139.0

some columns are not shown but they all have same structures

the small dataframe looks like this:

route         A6TCE   BCTI   BDTI   MA2TCE  ...     TD7      TD8      TD25    V2TCE
period                                      ...                                   
2023-08-02  17134.0  720.0  821.0  28859.0  ...  9917.0  31700.0  10408.0  11800.0

The small dataframe has more routes than the bigger one,

I wish to create a new dataframe which has the small dataframe as the first row, but only with the columns(routes) which overlaps. And only under the column "value", NOT "daily_change"

    route            TC2_37               ...             TD25             
                     value    daily_change  ...         value    daily_change
    period
2023-08-02           990.0                  ...        10408.0                                    
    Aug 23          20339.0       4018.0  ...          26569.0       -951.0
    Sep 23          19737.0       3037.0  ...          32725.0       -507.0
    Oct 23          19821.0       1316.0  ...          38033.0        -18.0
    Nov 23          20803.0        580.0  ...          40282.0       -188.0
    Dec 23          22070.0        115.0  ...          42195.0       -148.0
    Q3 23           18158.0       1891.0  ...          31269.0      -1102.0
    Q4 23           20899.0        672.0  ...          40170.0       -117.0
    Q1 24           16361.0        363.0  ...          37983.0       -125.0
    Q2 24           14581.0        380.0  ...          28731.0        546.0
    Q3 24           13029.0        415.0  ...          27840.0        628.0
    Q4 24           16701.0        310.0  ...          33390.0        520.0
    Cal 24          15168.0        367.0  ...          31986.0        393.0
    Cal 25          13950.0         98.0  ...          30712.0        139.0

Reproduce this part of the bigger dataframe from dict:

{('TC2_37', 'value'): {'Aug 23': 20339.0, 'Sep 23': 19737.0, 'Oct 23': 19821.0, 'Nov 23': 20803.0, 'Dec 23': 22070.0, 'Q3 23': 18158.0, 'Q4 23': 20899.0, 'Q1 24': 16361.0, 'Q2 24': 14581.0, 'Q3 24': 13029.0, 'Q4 24': 16701.0, 'Cal 24': 15168.0, 'Cal 25': 13950.0}, 
 ('TC2_37', 'daily_change'): {'Aug 23': 4018.0, 'Sep 23': 3037.0, 'Oct 23': 1316.0, 'Nov 23': 580.0, 'Dec 23': 115.0, 'Q3 23': 1891.0, 'Q4 23': 672.0, 'Q1 24': 363.0, 'Q2 24': 380.0, 'Q3 24': 415.0, 'Q4 24': 310.0, 'Cal 24': 367.0, 'Cal 25': 98.0}, 
 ('TD25', 'value'): {'Aug 23': 26569.0, 'Sep 23': 32725.0, 'Oct 23': 38033.0, 'Nov 23': 40282.0, 'Dec 23': 42195.0, 'Q3 23': 31269.0, 'Q4 23': 40170.0, 'Q1 24': 37983.0, 'Q2 24': 28731.0, 'Q3 24': 27840.0, 'Q4 24': 33390.0, 'Cal 24': 31986.0, 'Cal 25': 30712.0}, 
 ('TD25', 'daily_change'): {'Aug 23': -951.0, 'Sep 23': -507.0, 'Oct 23': -18.0, 'Nov 23': -188.0, 'Dec 23': -148.0, 'Q3 23': -1102.0, 'Q4 23': -117.0, 'Q1 24': -125.0, 'Q2 24': 546.0, 'Q3 24': 628.0, 'Q4 24': 520.0, 'Cal 24': 393.0, 'Cal 25': 139.0}}

Solution

The columns of the large dataframe are multi-indexed, whereas the small dataframe has them flat. A merge requires compatible columns. Therefore, either flatten the large, or multi-index the small one.

Here, multi-indexing the small dataframe:

Input data: "bigger" dataframe

Periods    = ['Aug 23','Sep 23','Oct 23','Nov 23','Dec 23','Q3 23','Q4 23','Q1 24','Q2 24','Q3 24','Q4 24','Cal 24','Cal 25']
Routes     = ['TC2_37', 'TD25']
Categories = ['value','daily_change']
Data = [[20339.0,4018.0,26569.0,-951.0],
        [19737.0,3037.0,32725.0,-507.0],
        [19821.0,1316.0,38033.0,-18.0],
        [20803.0,580.0,40282.0,-188.0],
        [22070.0,115.0,42195.0,-148.0],
        [18158.0,1891.0,31269.0,-1102.0],
        [20899.0,672.0,40170.0,-117.0],
        [16361.0,363.0,37983.0,-125.0],
        [14581.0,380.0,28731.0,546.0],
        [13029.0,415.0,27840.0,628.0],
        [16701.0,310.0,33390.0,520.0],
        [15168.0,367.0,31986.0,393.0],
        [13950.0,98.0,30712.0,139.0]]

DF = pd.DataFrame(index   = pd.Index(Periods, name = 'periods'),
                  columns = pd.MultiIndex.from_product([Routes, Categories]),
                  data    = Data)
DF
          TC2_37                  TD25             
           value daily_change    value daily_change
periods                                            
Aug 23   20339.0       4018.0  26569.0       -951.0
Sep 23   19737.0       3037.0  32725.0       -507.0
Oct 23   19821.0       1316.0  38033.0        -18.0
Nov 23   20803.0        580.0  40282.0       -188.0
Dec 23   22070.0        115.0  42195.0       -148.0
Q3 23    18158.0       1891.0  31269.0      -1102.0
Q4 23    20899.0        672.0  40170.0       -117.0
Q1 24    16361.0        363.0  37983.0       -125.0
Q2 24    14581.0        380.0  28731.0        546.0
Q3 24    13029.0        415.0  27840.0        628.0
Q4 24    16701.0        310.0  33390.0        520.0
Cal 24   15168.0        367.0  31986.0        393.0
Cal 25   13950.0         98.0  30712.0        139.0

Input data: "small" dataframe

routes   = ['A6TCE','BCTI','BDTI','MA2TCE','TD7', 'TD8',  'TD25', 'V2TCE']
values   = [17134.0, 720.0, 821.0, 28859.0,9917.0,31700.0,10408.0,11800.0]
category = ['value']
period   = ['2023-08-02']

(1) If you can create it directly with multi-indexed columns:

df = pd.DataFrame(index   = pd.Index(period, name = 'periods'),
                  columns = pd.MultiIndex.from_product([routes, category]),
                  data    = [values])
df
              A6TCE   BCTI   BDTI   MA2TCE     TD7      TD8     TD25    V2TCE
              value  value  value    value   value    value    value    value
periods                                                                      
2023-08-02  17134.0  720.0  821.0  28859.0  9917.0  31700.0  10408.0  11800.0

(2) If you cannot create it directly with multi-indexed columns, then adding the level is simple: How to simply add a column level to a pandas dataframe

df = pd.DataFrame(index   = pd.Index(period, name = 'periods'),
                  columns = routes,
                  data    = [values])
df
              A6TCE   BCTI   BDTI   MA2TCE     TD7      TD8     TD25    V2TCE
periods                                                                      
2023-08-02  17134.0  720.0  821.0  28859.0  9917.0  31700.0  10408.0  11800.0

# Add the level:
df.columns = pd.MultiIndex.from_product([df.columns, ['value']])
df
              A6TCE   BCTI   BDTI   MA2TCE     TD7      TD8     TD25    V2TCE
              value  value  value    value   value    value    value    value
periods                                                                      
2023-08-02  17134.0  720.0  821.0  28859.0  9917.0  31700.0  10408.0  11800.0

Merging

Once index compatibility is there, merging is obvious:

pd.concat([DF, df], axis=0, join='inner')

Output: As requested,

only routes both dataframes had in common
and only the "value".

               TD25
              value
periods            
Aug 23      26569.0
Sep 23      32725.0
Oct 23      38033.0
Nov 23      40282.0
Dec 23      42195.0
Q3 23       31269.0
Q4 23       40170.0
Q1 24       37983.0
Q2 24       28731.0
Q3 24       27840.0
Q4 24       33390.0
Cal 24      31986.0
Cal 25      30712.0
2023-08-02  10408.0

As for sorting the index with new row on top, for that you will need to rephrase its contents in an homogeneous time unit, i.e. avoid mixing days, months or quarters, ideally using datetime format.