Search code examples
pythondataframepanel

Create panel data from a dataframe


I couldn't find a solution to my problem here and hope you could solve it. Here is my current dataframe:

| Index | Col_1_data | Col_2_data | Col_3_data | Col_1_new_data | Col_2_new_data | Col_3_new_data |
|   1   |     a      |     d      |     g      |     j          |     m          |     p          |
|   2   |     b      |     e      |     h      |     k          |     n          |     q          |
|   3   |     c      |     f      |     i      |     l          |     o          |     r          |

And the idea is to melt it so that it will look like this:

| Index | data | new_data | id_col |
|   1   |    a |     j    | Col_1  |   
|   2   |    b |     k    | Col_1  |  
|   3   |    c |     l    | Col_1  |
|   1   |    d |     m    | Col_2  |
|   2   |    e |     n    | Col_2  |
|   3   |    f |     o    | Col_2  |
....etc....

So make it a panel but without any ID column, rather a partial string that is the same per each column (e.g. Col_1, Col_2, etc.) Is there a way to do it 'pythonically' or do I need to loop and stack?


Solution

  • Another not so good way to doing this -

    import pandas as pd
    import numpy as np
    
    df= pd.read_csv('test.csv')
    
    col1 = ['Index', 'Col_1_data', 'Col_2_data', 'Col_3_data']
    col2 =  ['Index', 'Col_1_new_data', 'Col_2_new_data', 'Col_3_new_data']
    df1  = df[col1].transpose()
    df2 = df[col2]
    df2 = df2.rename(columns= dict(zip(col2,col1))).transpose()
    def update_header(df):
        new_header = df.iloc[0] #grab the first row for the header
        df = df[1:] #take the data less the header row
        df.columns = new_header 
        return df
        
    df1 = update_header(df1)
    df2 = update_header(df2)
    df1['data'] = df1[df1.columns.values].values.tolist()
    df2['new_data'] = df2[df2.columns.values].values.tolist()
    merged_df = df1.merge(df2,left_index=True, right_index=True)
    merged_df = merged_df[['data', 'new_data']]
    final_df = merged_df.apply(pd.Series.explode).reset_index()
    

    output-

    Index   index   data    new_data
    0   Col_1_data  a   j
    1   Col_1_data  b   k
    2   Col_1_data  c   l
    3   Col_2_data  d   m
    4   Col_2_data  e   n
    5   Col_2_data  f   o
    6   Col_3_data  g   p
    7   Col_3_data  h   q
    8   Col_3_data  i   NaN