Assuming a multiindex dataframe as follows
THE (DUMMY) DATA
import pandas as pd
df={('AB30566', 'ACTIVE1', 'A1'): {('2021-01-01', 'PHOTO'): 2,
('2021-01-01', 'QUE'): 8,
('2021-01-01', 'TXR'): 4,
('2022-02-01', 'PHOTO'): 4,
('2022-02-01', 'QUE'): 0,
('2022-02-01', 'TXR'): 1,
('2022-03-01', 'PHOTO'): 9,
('2022-03-01', 'QUE'): 7,
('2022-03-01', 'TXR'): 7},
('CD55DF55', 'ACTIVE2', 'A2'): {('2021-01-01', 'PHOTO'): 1,
('2021-01-01', 'QUE'): 7,
('2021-01-01', 'TXR'): 0,
('2022-02-01', 'PHOTO'): 8,
('2022-02-01', 'QUE'): 8,
('2022-02-01', 'TXR'): 3,
('2022-03-01', 'PHOTO'): 6,
('2022-03-01', 'QUE'): 0,
('2022-03-01', 'TXR'): 7},
('ZT52556', 'UNACTIVE1', 'A3'): {('2021-01-01', 'PHOTO'): 8,
('2021-01-01', 'QUE'): 9,
('2021-01-01', 'TXR'): 3,
('2022-02-01', 'PHOTO'): 5,
('2022-02-01', 'QUE'): 3,
('2022-02-01', 'TXR'): 0,
('2022-03-01', 'PHOTO'): 7,
('2022-03-01', 'QUE'): 0,
('2022-03-01', 'TXR'): 9},
('MIKE90', 'PENSIONER1', 'A4'): {('2021-01-01', 'PHOTO'): 3,
('2021-01-01', 'QUE'): 9,
('2021-01-01', 'TXR'): 8,
('2022-02-01', 'PHOTO'): 3,
('2022-02-01', 'QUE'): 2,
('2022-02-01', 'TXR'): 1,
('2022-03-01', 'PHOTO'): 9,
('2022-03-01', 'QUE'): 0,
('2022-03-01', 'TXR'): 4},
('ZZ00001', 'ACTIVE3', 'A5'): {('2021-01-01', 'PHOTO'): 0,
('2021-01-01', 'QUE'): 2,
('2021-01-01', 'TXR'): 1,
('2022-02-01', 'PHOTO'): 2,
('2022-02-01', 'QUE'): 0,
('2022-02-01', 'TXR'): 8,
('2022-03-01', 'PHOTO'): 5,
('2022-03-01', 'QUE'): 6,
('2022-03-01', 'TXR'): 0}}
(The real case is much bigger of course)
I need to change the values of the names in the level 0 called userid based on a function.
I do it in the following way and this strange result happens:
THE CODE & WRONG SOLUTION
d=pd.DataFrame(f)
d.columns.names =["USERID", "STATUS","LEVEL"]
def simple_mask_user_id(userids):
exam_dict = {userid:("EX"+str(i).zfill(5) if re.match(r"[A-Z][A-Z][0-9][0-9][0-9][0-9][0-9]",userid) else userid) for i,userid in enumerate(userids) }
return exam_dict
current_userids = d.columns.get_level_values('USERID').tolist()
dict_mask = simple_mask_user_id(current_userids)
display(d)
new_names = d.columns.get_level_values("USERID").map(dict_mask).tolist()
print(new_names)
d.columns.set_levels(new_names, level=0, inplace=True)
display(d)
The level USERID of the dataframe should be chaged accoring to the dict:
{'AB30566': 'EX00000', 'CD55DF55': 'CD55DF55', 'ZT52556': 'EX00002', 'MIKE90': 'MIKE90', 'ZZ00001': 'EX00004'}
THE FAULTY RESULT
I display the df to compared the result before and after. The index got mixed.
MIKE90 and EX00002 gets changed by each other.
In other words, MIKE90 is not on top of the corresponding PENSIONER1, A4, which is the other levels corresponding to it (MIKE90 does not get changed) You can also see that the order of the list new names has the correct order.
THE QUESTIONS
Why? How do you change one level of the multindex without altering the data?
I would use MultiIndex.map
and a given mapping dictionary (d
) to substitue level=0 values
df.columns = df.columns.map(lambda c: (d[c[0]], *c[1:]))
Result
EX00000 CD55DF55 EX00002 MIKE90 EX00004
ACTIVE1 ACTIVE2 UNACTIVE1 PENSIONER1 ACTIVE3
A1 A2 A3 A4 A5
2021-01-01 PHOTO 2 1 8 3 0
QUE 8 7 9 9 2
TXR 4 0 3 8 1
2022-02-01 PHOTO 4 8 5 3 2
QUE 0 8 3 2 0
TXR 1 3 0 1 8
2022-03-01 PHOTO 9 6 7 9 5
QUE 7 0 0 0 6
TXR 7 7 9 4 0