I have a dataframe with multiindex hierarchical colmn names, including empty strings as column index names. How to subset second and third columns?
import numpy as np
import pandas as pd
df = pd.DataFrame(np.arange(15).reshape(5,3),
index=[1,2,3,4,5],
columns=[['A', 'A', 'B'],
['a', 'b', ''],
['', 'x', '']]
)
df.columns.names = ["c_ix0", "c_ix1", "c_ix2"]
print(df)
c_ix0 A B
c_ix1 a b
c_ix2 x
1 0 1 2
2 3 4 5
3 6 7 8
4 9 10 11
5 12 13 14
expected output:
c_ix0 A B
c_ix1 b
c_ix2 x
1 1 2
2 4 5
3 7 8
4 10 11
5 13 14
I believe you need xs
:
a = df.xs('b', axis=1, level=1)
print (a)
c_ix0 A
c_ix2 x
1 1
2 4
3 7
4 10
5 13
b = df.xs('B', axis=1, level=0)
print (b)
c_ix1
c_ix2
1 2
2 5
3 8
4 11
5 14
If want select by positions use iloc
:
c = df.iloc[:, 1]
print (c)
1 1
2 4
3 7
4 10
5 13
Name: (A, b, x), dtype: int32
EDIT:
d = df.iloc[:, [1, 2]]
print (d)
c_ix0 A B
c_ix1 b
c_ix2 x
1 1 2
2 4 5
3 7 8
4 10 11
5 13 14