I am sorry for asking but I did not get the still existing answers. I simply glued two data frames with the same column names.
| | X | Y | X | Y |
|---:|----:|----:|----:|----:|
| 0 | 1 | 3 | 9 | 7 |
| 1 | 2 | 4 | 8 | 6 |
What I want is
| | FOO | BAR |
| | X | Y | X | Y |
|---:|----:|----:|----:|----:|
| 0 | 1 | 3 | 9 | 7 |
| 1 | 2 | 4 | 8 | 6 |
I tried pd.MultiIndex.from_product([c.columns, ['FOO', 'BAR']])
but this results in
MultiIndex([('X', 'FOO'),
('X', 'BAR'),
('Y', 'FOO'),
('Y', 'BAR'),
('X', 'FOO'),
('X', 'BAR'),
('Y', 'FOO'),
('Y', 'BAR')],
)
But I need
MultiIndex([('X', 'FOO'),
('Y', 'FOO'),
('X', 'BAR'),
('Y', 'BAR')],
)
This is an MWE
#!/usr/bin/env python3
import pandas as pd
a = pd.DataFrame({'X': [1,2], 'Y': [3, 4]})
b = pd.DataFrame({'X': [9,8], 'Y': [7, 6]})
c = pd.concat([a, b], axis=1)
# throws a ValueError: Length mismatch: Expected axis has 4 elements, new values have 8 elements
c.columns = pd.MultiIndex.from_product([c.columns, ['FOO', 'BAR']])
Would it help to do something to the two separate DataFrames before I concat()
them?
You could simply add the extra level using numpy.repeat
and pandas.MultiIndex
:
import numpy as np
extra = ['FOO', 'BAR']
c.columns = pd.MultiIndex.from_arrays([np.repeat(extra, len(c.columns)//len(extra)),
c.columns])
output:
FOO BAR
X Y X Y
0 1 3 9 7
1 2 4 8 6
NB. If the columns are shuffled, sort the columns and use np.tile
instead:
c = c.sort_index(axis=1)
extra = ['FOO', 'BAR']
c.columns = pd.MultiIndex.from_arrays([np.tile(extra, len(c.columns)//len(extra)),
c.columns])
output:
FOO BAR FOO BAR
X X Y Y
0 1 9 3 7
1 2 8 4 6