Search code examples
pythonpandasdataframemulti-index

Copy an N-Index column of a Pandas Dataframe including all sub-headers


This question expands off of a previous question of mine: Copy a Multi-Index column of a Pandas Dataframe including the second header

EDIT - Clarifying that one of my issues deals with the discrepancy when N=1 (produces an Index dataframe) and when N>1 (produces a MultiIndex dataframe...)

Background - I have a dataset with N headers that I've read from a CSV. N can be anything greater than or equal to 1...

e.g. N=1:

df = pd.read_csv(file, header=[0:N])
print(df)

     A   B   C 
---------------  (Single-)Index dataframe
0    1   2   3
1    4   5   6
2    7   8   9
       ...

e.g. N=3:

df = pd.read_csv(file, header=[0:N])
print(df)

     A   B   C
     a   b   c
     ɑ   β   𝛾   
---------------  MultiIndex dataframe
0    1   2   3
1    4   5   6
2    7   8   9
       ...

What I Want - I'd like to duplicate one of the columns (say, A) into a new column D, such that its sub-headers are copied as-is for an arbitrary (N) number of index headers...

     A   B   C   D
  ...N header rows...
     a   b   c   a
     ɑ   β   𝛾   ɑ
----------------------
0    1   2   3   1
1    4   5   6   4
2    7   8   9   7
       ...

Current Solution - ...based on answers to my original question, I can do this with a fixed number of header rows, N...

e.g. N=2:

df[('D',df['A'].columns[0])] = df['A'] 

But this won't work for a normal, single-Index dataframe, since df['A'] won't have a .columns attribute...

Need For Extension - ...I'm unsure what the most pythonic way to handle N rows might be... presently I've made an if case for N=1 and N=2, but that feels clunky...

I wonder if there's a way to force a single-Index dataframe to be a MultiIndex dataframe with a single index row?


Solution

  • You can try

    df[('D',) + df['A'].columns[0]] = df['A']
    
    print(df)
    
       A  B  C  D
       a  b  c  a
       ɑ  β  𝛾  ɑ
    0  1  2  3  1
    1  4  5  6  4
    2  7  8  9  7
    

    If there is multiple second level columns under A

    df[[('D',) + col for col in df['A'].columns]] = df['A']
    
    print(df)
    
       A     B  C  D
       a  b  b  c  a  b
       ɑ  β  β  𝛾  ɑ  β
    0  1  0  2  3  1  0
    1  4  0  5  6  4  0
    2  7  0  8  9  7  0