changing and setting one level of a multiindex dataframe with a function

Assuming a multiindex dataframe as follows

THE (DUMMY) DATA

import pandas as pd

df={('AB30566', 'ACTIVE1', 'A1'): {('2021-01-01', 'PHOTO'): 2,
 ('2021-01-01', 'QUE'): 8,
 ('2021-01-01', 'TXR'): 4,
 ('2022-02-01', 'PHOTO'): 4,
 ('2022-02-01', 'QUE'): 0,
 ('2022-02-01', 'TXR'): 1,
 ('2022-03-01', 'PHOTO'): 9,
 ('2022-03-01', 'QUE'): 7,
 ('2022-03-01', 'TXR'): 7},
 ('CD55DF55', 'ACTIVE2', 'A2'): {('2021-01-01', 'PHOTO'): 1,
 ('2021-01-01', 'QUE'): 7,
 ('2021-01-01', 'TXR'): 0,
 ('2022-02-01', 'PHOTO'): 8,
 ('2022-02-01', 'QUE'): 8,
 ('2022-02-01', 'TXR'): 3,
 ('2022-03-01', 'PHOTO'): 6,
 ('2022-03-01', 'QUE'): 0,
 ('2022-03-01', 'TXR'): 7},
('ZT52556', 'UNACTIVE1', 'A3'): {('2021-01-01', 'PHOTO'): 8,
  ('2021-01-01', 'QUE'): 9,
  ('2021-01-01', 'TXR'): 3,
  ('2022-02-01', 'PHOTO'): 5,
  ('2022-02-01', 'QUE'): 3,
  ('2022-02-01', 'TXR'): 0,
  ('2022-03-01', 'PHOTO'): 7,
  ('2022-03-01', 'QUE'): 0,
  ('2022-03-01', 'TXR'): 9},
 ('MIKE90', 'PENSIONER1', 'A4'): {('2021-01-01', 'PHOTO'): 3,
  ('2021-01-01', 'QUE'): 9,
  ('2021-01-01', 'TXR'): 8,
  ('2022-02-01', 'PHOTO'): 3,
  ('2022-02-01', 'QUE'): 2,
  ('2022-02-01', 'TXR'): 1,
  ('2022-03-01', 'PHOTO'): 9,
  ('2022-03-01', 'QUE'): 0,
  ('2022-03-01', 'TXR'): 4},
 ('ZZ00001', 'ACTIVE3', 'A5'): {('2021-01-01', 'PHOTO'): 0,
  ('2021-01-01', 'QUE'): 2,
  ('2021-01-01', 'TXR'): 1,
  ('2022-02-01', 'PHOTO'): 2,
  ('2022-02-01', 'QUE'): 0,
  ('2022-02-01', 'TXR'): 8,
  ('2022-03-01', 'PHOTO'): 5,
  ('2022-03-01', 'QUE'): 6,
  ('2022-03-01', 'TXR'): 0}}

(The real case is much bigger of course)

I need to change the values of the names in the level 0 called userid based on a function.

I do it in the following way and this strange result happens:

THE CODE & WRONG SOLUTION

d=pd.DataFrame(f)
d.columns.names =["USERID", "STATUS","LEVEL"]
def simple_mask_user_id(userids):
    exam_dict = {userid:("EX"+str(i).zfill(5) if re.match(r"[A-Z][A-Z][0-9][0-9][0-9][0-9][0-9]",userid) else userid) for i,userid in enumerate(userids) }
    return exam_dict
current_userids = d.columns.get_level_values('USERID').tolist()
dict_mask = simple_mask_user_id(current_userids)
display(d)
new_names = d.columns.get_level_values("USERID").map(dict_mask).tolist()
print(new_names)
d.columns.set_levels(new_names, level=0, inplace=True)
display(d)

The level USERID of the dataframe should be chaged accoring to the dict:

{'AB30566': 'EX00000', 'CD55DF55': 'CD55DF55', 'ZT52556': 'EX00002', 'MIKE90': 'MIKE90', 'ZZ00001': 'EX00004'}

THE FAULTY RESULT

I display the df to compared the result before and after. The index got mixed.

MIKE90 and EX00002 gets changed by each other.

In other words, MIKE90 is not on top of the corresponding PENSIONER1, A4, which is the other levels corresponding to it (MIKE90 does not get changed) You can also see that the order of the list new names has the correct order.

THE QUESTIONS

Why? How do you change one level of the multindex without altering the data?

Solution

I would use MultiIndex.map and a given mapping dictionary (d) to substitue level=0 values

df.columns = df.columns.map(lambda c: (d[c[0]], *c[1:]))

Result

                 EX00000 CD55DF55   EX00002     MIKE90 EX00004
                 ACTIVE1  ACTIVE2 UNACTIVE1 PENSIONER1 ACTIVE3
                      A1       A2        A3         A4      A5
2021-01-01 PHOTO       2        1         8          3       0
           QUE         8        7         9          9       2
           TXR         4        0         3          8       1
2022-02-01 PHOTO       4        8         5          3       2
           QUE         0        8         3          2       0
           TXR         1        3         0          1       8
2022-03-01 PHOTO       9        6         7          9       5
           QUE         7        0         0          0       6
           TXR         7        7         9          4       0