I have a question hoping you can help me out here. I have an excel-file with multiple header, which I read following way
df=pd.read_excel(path,sheet_name="name",header=[1,2,3,4,5,6]
The original excel files looks like this
Required 1 - Person
Legacy Contact Person Number Business partner category
Central Data Central Data
Header Data Header Data
CHAR CHAR
10 1
PARTNER TYPE
Test 1 Name 1
Test 2 Name 2
Test 3 Name 3
Test 4 Name 4
Test 5 Name 5
Problem: I am trying to find an easy way to have a following statement
df["PARTNER"]=df["PARTNER"].fillna("Missing")
I realized that there is an issue by printing the command
print(df["PARTNER"])
Question: Is there any way on how to print a column in a frame with multiple header and select the header I am interested in, in this case "PARTNER"?
To select (print) data from a particular level of a MultiIndex you can use xs
:
df = pd.DataFrame({('Legacy Contact Person Number', 'Central Data', 'Header Data', 'CHAR', '10', 'PARTNER'): ['Test 1', None, 'Test 3', 'Test 4', 'Test 5'],
('Business partner category', 'Central Data', 'Header Data', 'CHAR', '1', 'TYPE'): ['Name 1', 'Name 2', 'Name 3', 'Name 4', 'Name 5']})
print(df.xs(('PARTNER'), level=5, axis=1, drop_level=False))
Output:
Legacy Contact Person Number
Central Data
Header Data
CHAR
10
PARTNER
0 Test 1
1 None
2 Test 3
3 Test 4
4 Test 5
However, you can't use xs
to set data. This is possible with an IndexSlice
:
idx = pd.IndexSlice
df.loc[:, idx[:, :, :, :, :, 'PARTNER']] = df.loc[:, idx[:, :, :, :, :, 'PARTNER']].fillna('Missing')
Result:
Legacy Contact Person Number Business partner category
Central Data Central Data
Header Data Header Data
CHAR CHAR
10 1
PARTNER TYPE
0 Test 1 Name 1
1 Missing Name 2
2 Test 3 Name 3
3 Test 4 Name 4
4 Test 5 Name 5
Instead of df.loc[:, idx[:, :, :, :, :, 'PARTNER']]
you can also use df.loc(axis=1)[:, :, :, :, :, 'PARTNER']
.