Search code examples
pythonpandasmulti-level

How to remove nan when extend multilevel column in Pandas


I would like to extend the following multilevel

first   tech_one                      ... tech_four       etc       mtc
second    ch0_b0    ch1_b0    ch2_b0  ...       ch5                    
0       1.764052  0.400157  0.978738  ...  0.144044  1.454274  0.761038
1       0.121675  0.443863  0.333674  ... -0.742165  2.269755 -1.454366
2       0.045759 -0.187184  1.532779  ...  1.230291  1.202380 -0.387327

Into

   tech_one                      ... tech_four       etc       mtc
        ch0       ch1       ch2  ...       ch5         _         _
         b0        b0        b0  ...                  
0  1.764052  0.400157  0.978738  ...  0.144044  1.454274  0.761038
1  0.121675  0.443863  0.333674  ... -0.742165  2.269755 -1.454366
2  0.045759 -0.187184  1.532779  ...  1.230291  1.202380 -0.387327

The following code is drafted.

import pandas as pd
import numpy as np
import re
np.random.seed(0)


arrays = [["tech_one", "tech_one", "tech_one", "tech_one", "tech_two", "tech_two", "tech_two",
           "tech_two",'tech_three','tech_three','tech_four','etc','mtc'],
          ["ch0_b0", "ch1_b0", "ch2_b0", "ch3_b0", "ch0", "ch1", "ch2", "ch3","ch1",'ch3','ch5','','']]

index = pd.MultiIndex.from_tuples(list(zip(*arrays)), names=["first", "second"])
df = pd.DataFrame(np.random.randn(3, len(arrays[0])), columns=index)
tup=[(e[0],*re.split('_',e[1])) for e in df.columns]
remove_nan=[tuple('_' if x == '' else x for x in x) for x in tup]

df.columns= pd.MultiIndex.from_tuples(remove_nan)

which produced the folllowing

   tech_one                      ... tech_four       etc       mtc
        ch0       ch1       ch2  ...       ch5         _         _
         b0        b0        b0  ...       NaN       NaN       NaN
0  1.764052  0.400157  0.978738  ...  0.144044  1.454274  0.761038
1  0.121675  0.443863  0.333674  ... -0.742165  2.269755 -1.454366
2  0.045759 -0.187184  1.532779  ...  1.230291  1.202380 -0.387327

As can be seen above, the NaN is still there despite being removed using the line

remove_nan=[tuple('_' if x == '' else x for x in x) for x in tup]

May I know to handle this issue?


Solution

  • As I have mentioned in the comment, the reason for NaN is different length tuples, you can just create the list of tuples with equal length, taking the length of maximum length tuples from the list, and having empty string '' to enlarge the tuple.

    n = len(max(remove_nan, key=len))
    remove_nan=[t+('',)*(n-len(t)) for t in remove_nan]
    df.columns= pd.MultiIndex.from_tuples(remove_nan)
    

    OUTPUT:

       tech_one                      ... tech_four       etc       mtc
            ch0       ch1       ch2  ...       ch5         _         _
             b0        b0        b0  ...                              
    0 -0.969233  0.746873  0.253076  ...  0.087689  0.874305  0.380449
    1  0.387685 -0.382714 -1.043338  ... -1.684973  1.346454 -0.437792
    2 -1.300301  0.164648 -0.032736  ...  1.198207  1.608662 -0.818090
    [3 rows x 13 columns]