This question expands off of a previous question of mine: Copy a Multi-Index column of a Pandas Dataframe including the second header
EDIT - Clarifying that one of my issues deals with the discrepancy when N=1
(produces an Index
dataframe) and when N>1
(produces a MultiIndex
dataframe...)
Background - I have a dataset with N
headers that I've read from a CSV. N
can be anything greater than or equal to 1...
e.g. N=1
:
df = pd.read_csv(file, header=[0:N])
print(df)
A B C
--------------- (Single-)Index dataframe
0 1 2 3
1 4 5 6
2 7 8 9
...
e.g. N=3
:
df = pd.read_csv(file, header=[0:N])
print(df)
A B C
a b c
ɑ β 𝛾
--------------- MultiIndex dataframe
0 1 2 3
1 4 5 6
2 7 8 9
...
What I Want - I'd like to duplicate one of the columns (say, A) into a new column D, such that its sub-headers are copied as-is for an arbitrary (N
) number of index headers...
A B C D
...N header rows...
a b c a
ɑ β 𝛾 ɑ
----------------------
0 1 2 3 1
1 4 5 6 4
2 7 8 9 7
...
Current Solution - ...based on answers to my original question, I can do this with a fixed number of header rows, N
...
e.g. N=2
:
df[('D',df['A'].columns[0])] = df['A']
But this won't work for a normal, single-Index dataframe, since df['A']
won't have a .columns
attribute...
Need For Extension - ...I'm unsure what the most pythonic way to handle N
rows might be... presently I've made an if
case for N=1
and N=2
, but that feels clunky...
I wonder if there's a way to force a single-Index
dataframe to be a MultiIndex
dataframe with a single index row?
You can try
df[('D',) + df['A'].columns[0]] = df['A']
print(df)
A B C D
a b c a
ɑ β 𝛾 ɑ
0 1 2 3 1
1 4 5 6 4
2 7 8 9 7
If there is multiple second level columns under A
df[[('D',) + col for col in df['A'].columns]] = df['A']
print(df)
A B C D
a b b c a b
ɑ β β 𝛾 ɑ β
0 1 0 2 3 1 0
1 4 0 5 6 4 0
2 7 0 8 9 7 0