Search code examples
pythonpandasdataframetranspose

How should you name columns when transposing a dataframe?


I am initialising a dataframe with lists, having followed the advice here. I then need to transpose the dataframe.

In the first example I take the column names from the lists used to initialise the dataframe.

In the second example I add the column names last.

-> Is there any difference between these examples?

-> Is there a standard or better way of naming columns of dataframes initialised like this?

p_id = ['a_1','a_2']
p = ['a','b']
p_id.insert(0,'p_id')
p.insert(0,'p')

df = pd.DataFrame([p_id, p])
df = df.transpose()
df.columns = df.iloc[0]
df = df[1:]
df

>>>
    p_id    p
0   a_1     a
1   a_2     b
p_id = ['a_1','a_2']
p = ['a','b']

df = pd.DataFrame([p_id, p])
df = df.transpose()
df.columns = ['p_id', 'p']
df

>>>
    p_id    p
0   a_1     a
1   a_2     b

Solution

  • Yes, there is difference in indices:

    print(df.equals(df1))
    False
    
    print (df.index)
    RangeIndex(start=1, stop=3, step=1)
    
    print (df1.index)
    RangeIndex(start=0, stop=2, step=1)
    
    print (df.index == df1.index)
    [False False]
    

    Solution is create defaul index in df by DataFrame.reset_index with drop=True parameter:

    df = df.reset_index(drop=True)
    
    print(df.equals(df1))
    True
    
    print (df.index)
    RangeIndex(start=0, stop=2, step=1)
    
    print (df1.index)
    RangeIndex(start=0, stop=2, step=1)
    
    print (df.index == df1.index)
    [ True  True]