Search code examples
pandasreplaceduplicatesmulti-indexfill

How to map values of pandas MultiIndex dataframe to an other MultiIndex dataframe having a different shape?


I have the Following two MultiIndex dataframes with different shapes: Pandas dataframe 'a'

col0 = ['Set 1','Set 1','Set 1','Set 1','Set 2','Set 2','Set 2','Set 2','Set 2','Set 2']
col1 = ['paa','paa','jaa','paa','jaa','jaa','jaa','paa','paa','paa']
a = pd.DataFrame(data = np.random.randint(6, size=(3, 10)), columns = [col0,col1])

With output:

    Set 1             Set 2                    
    paa paa jaa paa   jaa jaa jaa paa paa paa
0     3   0   2   1     2   0   3   5   4   3
1     2   1   2   1     0   5   5   5   3   4
2     5   2   1   2     5   1   5   5   0   2

And dataframe 'b'

col0 = ['Set 1','Set 1','Set 2','Set 2']
col1 = ['P1_1','P1_2','P2_1','P2_2']
b = pd.DataFrame(data = np.random.randint(3, size=(3, 4)), columns = [col0,col1])

with output:

   Set 1      Set 2     
   P1_1 P1_2  P2_1 P2_2
0     2    1     1    2
1     0    0     2    2
2     0    0     1    0

Now I want to combine the two. Keeping the MultiIndex of pandas 'a, and the values of pandas'b'.

Desired output of pandas 'c':

      Set 1                   Set 2                    
      P1_1  P1_2  P1_1  P1_2  P1_1  P1_2  P1_1  P1_2  P1_1  P1_2
0     2     1     2     1     1     2     1     2     1     2
1     0     0     0     0     2     2     2     2     2     2
2     0     0     0     0     1     0     1     0     1     0

Level_0 of pandas 'c' coincides with level_0 of pandas 'b'. Level_1 in 'c' alternates with the columns of pandas 'b'..

You might have to combine the following in some way:

temp=b.reindex(columns=map(lambda x:(x[0],'P1_1') ,a.columns))
a.groupby(level=0, axis=1)

Anything will help!


Solution

  • Idea is match levels a and b and repeat columns second level used for DataFrame.reindex:

    np.random.seed(123)
        
    col0 = ['Set 1','Set 1','Set 1','Set 1','Set 2','Set 2','Set 2','Set 2','Set 2','Set 2']
    col1 = ['paa','paa','jaa','paa','jaa','jaa','jaa','paa','paa','paa']
    a = pd.DataFrame(data = np.random.randint(6, size=(3, 10)), columns = [col0,col1])
    
    col0 = ['Set 1','Set 1','Set 2','Set 2']
    col1 = ['P1_1','P1_2','P2_1','P2_2']
    b = pd.DataFrame(data = np.random.randint(3, size=(3, 4)), columns = [col0,col1])
    
    
    print (a)
      Set 1             Set 2                    
        paa paa jaa paa   jaa jaa jaa paa paa paa
    0     5   2   4   2     1   3   2   3   1   1
    1     0   1   1   0     0   1   3   5   4   0
    2     0   4   1   3     2   4   2   4   0   5
    
    print (b)
      Set 1      Set 2     
       P1_1 P1_2  P2_1 P2_2
    0     0    1     0    0
    1     0    2     1    1
    2     2    2     2    1
    

    #https://stackoverflow.com/a/3391105
    def repeat_to_length(s, wanted):
        return (s * (wanted//len(s) + 1))[:wanted]
    
    
    out = []
    for lvl in a.columns.levels[0]:
        colsa = a.xs(lvl, axis=1, level=0).columns.tolist()
        colsb = b.xs(lvl, axis=1, level=0).columns.tolist()
        lvl1 = repeat_to_length(colsb, len(colsa))
        out.extend(list(zip([lvl] * len(lvl1), lvl1)))
    
    print (out)
    [('Set 1', 'P1_1'), ('Set 1', 'P1_2'), ('Set 1', 'P1_1'), 
     ('Set 1', 'P1_2'), ('Set 2', 'P2_1'), ('Set 2', 'P2_2'), 
     ('Set 2', 'P2_1'), ('Set 2', 'P2_2'), ('Set 2', 'P2_1'), ('Set 2', 'P2_2')]
    
    mux = pd.MultiIndex.from_tuples(out)
    print (mux)
    MultiIndex([('Set 1', 'P1_1'),
                ('Set 1', 'P1_2'),
                ('Set 1', 'P1_1'),
                ('Set 1', 'P1_2'),
                ('Set 2', 'P2_1'),
                ('Set 2', 'P2_2'),
                ('Set 2', 'P2_1'),
                ('Set 2', 'P2_2'),
                ('Set 2', 'P2_1'),
                ('Set 2', 'P2_2')],
               )
    

    c = b.reindex(mux, axis=1)
    print (c)
      Set 1                Set 2                         
       P1_1 P1_2 P1_1 P1_2  P2_1 P2_2 P2_1 P2_2 P2_1 P2_2
    0     0    1    0    1     0    0    0    0    0    0
    1     0    2    0    2     1    1    1    1    1    1
    2     2    2    2    2     2    1    2    1    2    1