Search code examples
pythonpandasmulti-index

changing and setting one level of a multiindex dataframe with a function


Assuming a multiindex dataframe as follows

THE (DUMMY) DATA

import pandas as pd

df={('AB30566', 'ACTIVE1', 'A1'): {('2021-01-01', 'PHOTO'): 2,
 ('2021-01-01', 'QUE'): 8,
 ('2021-01-01', 'TXR'): 4,
 ('2022-02-01', 'PHOTO'): 4,
 ('2022-02-01', 'QUE'): 0,
 ('2022-02-01', 'TXR'): 1,
 ('2022-03-01', 'PHOTO'): 9,
 ('2022-03-01', 'QUE'): 7,
 ('2022-03-01', 'TXR'): 7},
 ('CD55DF55', 'ACTIVE2', 'A2'): {('2021-01-01', 'PHOTO'): 1,
 ('2021-01-01', 'QUE'): 7,
 ('2021-01-01', 'TXR'): 0,
 ('2022-02-01', 'PHOTO'): 8,
 ('2022-02-01', 'QUE'): 8,
 ('2022-02-01', 'TXR'): 3,
 ('2022-03-01', 'PHOTO'): 6,
 ('2022-03-01', 'QUE'): 0,
 ('2022-03-01', 'TXR'): 7},
('ZT52556', 'UNACTIVE1', 'A3'): {('2021-01-01', 'PHOTO'): 8,
  ('2021-01-01', 'QUE'): 9,
  ('2021-01-01', 'TXR'): 3,
  ('2022-02-01', 'PHOTO'): 5,
  ('2022-02-01', 'QUE'): 3,
  ('2022-02-01', 'TXR'): 0,
  ('2022-03-01', 'PHOTO'): 7,
  ('2022-03-01', 'QUE'): 0,
  ('2022-03-01', 'TXR'): 9},
 ('MIKE90', 'PENSIONER1', 'A4'): {('2021-01-01', 'PHOTO'): 3,
  ('2021-01-01', 'QUE'): 9,
  ('2021-01-01', 'TXR'): 8,
  ('2022-02-01', 'PHOTO'): 3,
  ('2022-02-01', 'QUE'): 2,
  ('2022-02-01', 'TXR'): 1,
  ('2022-03-01', 'PHOTO'): 9,
  ('2022-03-01', 'QUE'): 0,
  ('2022-03-01', 'TXR'): 4},
 ('ZZ00001', 'ACTIVE3', 'A5'): {('2021-01-01', 'PHOTO'): 0,
  ('2021-01-01', 'QUE'): 2,
  ('2021-01-01', 'TXR'): 1,
  ('2022-02-01', 'PHOTO'): 2,
  ('2022-02-01', 'QUE'): 0,
  ('2022-02-01', 'TXR'): 8,
  ('2022-03-01', 'PHOTO'): 5,
  ('2022-03-01', 'QUE'): 6,
  ('2022-03-01', 'TXR'): 0}}
    

(The real case is much bigger of course)

I need to change the values of the names in the level 0 called userid based on a function.

I do it in the following way and this strange result happens:

THE CODE & WRONG SOLUTION

d=pd.DataFrame(f)
d.columns.names =["USERID", "STATUS","LEVEL"]
def simple_mask_user_id(userids):
    exam_dict = {userid:("EX"+str(i).zfill(5) if re.match(r"[A-Z][A-Z][0-9][0-9][0-9][0-9][0-9]",userid) else userid) for i,userid in enumerate(userids) }
    return exam_dict
current_userids = d.columns.get_level_values('USERID').tolist()
dict_mask = simple_mask_user_id(current_userids)
display(d)
new_names = d.columns.get_level_values("USERID").map(dict_mask).tolist()
print(new_names)
d.columns.set_levels(new_names, level=0, inplace=True)
display(d)

The level USERID of the dataframe should be chaged accoring to the dict:

{'AB30566': 'EX00000', 'CD55DF55': 'CD55DF55', 'ZT52556': 'EX00002', 'MIKE90': 'MIKE90', 'ZZ00001': 'EX00004'}

THE FAULTY RESULT

I display the df to compared the result before and after. The index got mixed.

MIKE90 and EX00002 gets changed by each other.

In other words, MIKE90 is not on top of the corresponding PENSIONER1, A4, which is the other levels corresponding to it (MIKE90 does not get changed) You can also see that the order of the list new names has the correct order.

THE QUESTIONS

Why? How do you change one level of the multindex without altering the data?

enter image description here


Solution

  • I would use MultiIndex.map and a given mapping dictionary (d) to substitue level=0 values

    df.columns = df.columns.map(lambda c: (d[c[0]], *c[1:]))
    

    Result

                     EX00000 CD55DF55   EX00002     MIKE90 EX00004
                     ACTIVE1  ACTIVE2 UNACTIVE1 PENSIONER1 ACTIVE3
                          A1       A2        A3         A4      A5
    2021-01-01 PHOTO       2        1         8          3       0
               QUE         8        7         9          9       2
               TXR         4        0         3          8       1
    2022-02-01 PHOTO       4        8         5          3       2
               QUE         0        8         3          2       0
               TXR         1        3         0          1       8
    2022-03-01 PHOTO       9        6         7          9       5
               QUE         7        0         0          0       6
               TXR         7        7         9          4       0