I would like to extend the following multilevel
first tech_one ... tech_four etc mtc
second ch0_b0 ch1_b0 ch2_b0 ... ch5
0 1.764052 0.400157 0.978738 ... 0.144044 1.454274 0.761038
1 0.121675 0.443863 0.333674 ... -0.742165 2.269755 -1.454366
2 0.045759 -0.187184 1.532779 ... 1.230291 1.202380 -0.387327
Into
tech_one ... tech_four etc mtc
ch0 ch1 ch2 ... ch5 _ _
b0 b0 b0 ...
0 1.764052 0.400157 0.978738 ... 0.144044 1.454274 0.761038
1 0.121675 0.443863 0.333674 ... -0.742165 2.269755 -1.454366
2 0.045759 -0.187184 1.532779 ... 1.230291 1.202380 -0.387327
The following code is drafted.
import pandas as pd
import numpy as np
import re
np.random.seed(0)
arrays = [["tech_one", "tech_one", "tech_one", "tech_one", "tech_two", "tech_two", "tech_two",
"tech_two",'tech_three','tech_three','tech_four','etc','mtc'],
["ch0_b0", "ch1_b0", "ch2_b0", "ch3_b0", "ch0", "ch1", "ch2", "ch3","ch1",'ch3','ch5','','']]
index = pd.MultiIndex.from_tuples(list(zip(*arrays)), names=["first", "second"])
df = pd.DataFrame(np.random.randn(3, len(arrays[0])), columns=index)
tup=[(e[0],*re.split('_',e[1])) for e in df.columns]
remove_nan=[tuple('_' if x == '' else x for x in x) for x in tup]
df.columns= pd.MultiIndex.from_tuples(remove_nan)
which produced the folllowing
tech_one ... tech_four etc mtc
ch0 ch1 ch2 ... ch5 _ _
b0 b0 b0 ... NaN NaN NaN
0 1.764052 0.400157 0.978738 ... 0.144044 1.454274 0.761038
1 0.121675 0.443863 0.333674 ... -0.742165 2.269755 -1.454366
2 0.045759 -0.187184 1.532779 ... 1.230291 1.202380 -0.387327
As can be seen above, the NaN
is still there despite being removed using the line
remove_nan=[tuple('_' if x == '' else x for x in x) for x in tup]
May I know to handle this issue?
As I have mentioned in the comment, the reason for NaN
is different length tuples, you can just create the list of tuples with equal length, taking the length of maximum length tuples from the list, and having empty string ''
to enlarge the tuple.
n = len(max(remove_nan, key=len))
remove_nan=[t+('',)*(n-len(t)) for t in remove_nan]
df.columns= pd.MultiIndex.from_tuples(remove_nan)
OUTPUT:
tech_one ... tech_four etc mtc
ch0 ch1 ch2 ... ch5 _ _
b0 b0 b0 ...
0 -0.969233 0.746873 0.253076 ... 0.087689 0.874305 0.380449
1 0.387685 -0.382714 -1.043338 ... -1.684973 1.346454 -0.437792
2 -1.300301 0.164648 -0.032736 ... 1.198207 1.608662 -0.818090
[3 rows x 13 columns]