Search code examples
pythonarrayspandasdataframeconcatenation

Python pandas create a new row by combining a variable and a list of variables for each row and appending


I have a list of data frames that look like this called date_group. I have 117 such frames, one for each day.

date_group[0] looks like this.

     x1  x2   x3    prob      x5          date
0   1.0 1.0 20.0    0.05    90.0    2021-12-23
1   1.0 2.0 20.0    0.60    90.0    2021-12-23
2   1.0 3.0 20.0    0.01    90.0    2021-12-23
3   1.0 4.0 20.0    0.02    90.0    2021-12-23
4   1.0 5.0 20.0    0.01    90.0    2021-12-23
...
30 .............................................

date_group[1] looks like this.

     x1  x2   x3    prob      x5          date
0   1.0 1.0 20.0    0.04    90.0    2021-12-24
1   1.0 2.0 20.0    0.61    90.0    2021-12-24
2   1.0 3.0 20.0    0.02    90.0    2021-12-24
3   1.0 4.0 20.0    0.03    90.0    2021-12-24
4   1.0 5.0 20.0    0.01    90.0    2021-12-24
...
29 ............................................

Each frame has a list of the same date but different prob or probabilities.

I am trying to create a single data frame in this format. I want to take the date and the entire column of probabilities, 30 in each frame and create there own row for each corresponding date.

          date    0    1    2    3    4 ........ 29
1.  2021-12-23 0.05 0.60 0.01 0.02 0.01 ...........
2.  2021-12-24 0.04 0.61 0.02 0.03 0.01 ...........
.....

So far I have written this code.

feed = pd.DataFrame(columns=['date',0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29])

for i in range(0, len(date_group)):
    feed.loc[i,'date'] = date_group[i].date.iloc[0]
    feed.loc[i] = date_group[i]['prob'].reset_index()['prob']

But my output looks like this,

    date    0   1   2   3   4   5   6   7   8   ... 20  21  22  23  24  25  26  27  28  29
0   NaN 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.02    0.0 0.0 0.0 0.0 0.0 0.05
1   NaN 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.02    0.0 0.0 0.0 0.0 0.0 0.05
2   NaN 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.02    0.0 0.0 0.0 0.0 0.0 0.05
3   NaN 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.02    0.0 0.0 0.0 0.0 0.0 0.05
4   NaN 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.01    0.0 0.0 0.0 0.0 0.0 0.04
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
112 NaN 0.32    0.5 0.68    0.79    0.98    0.79    0.99    0.99    0.99    ... 0.99    1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
113 NaN 0.29    0.45    0.62    0.72    0.89    0.72    1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
114 NaN 0.26    0.39    0.55    0.74    0.9 0.74    1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
115 NaN 0.24    0.34    0.49    0.76    0.91    0.76    1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
116 NaN 0.21    0.29    0.43    0.68    0.82    0.68    1.0 0.9 1.0 ... 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0

The dates are empty but everything else is there.

How do I get the dates in as well and make the frame look like I want it to in my example above?


Solution

  • here is one way to do it

    concatenate all the DFs together, and then use pivot to create the desired table. here, i named the first dataframe as DF, second as DF2 and combine these into df3

    # create a list of all the DFs
    df_list = [df.reset_index(), 
               df2.reset_index()
              ]
    
    df3=pd.concat(df_list  )
    
    
    df3.pivot(index='date' , 
              columns='index', 
              values='prob').reset_index()
    
    

    or without creating a df3

    # create a list of all the DFs
    df_list = [df.reset_index(), 
               df2.reset_index()
              ]
    
    
    pd.concat(df_list).pivot(index='date' , 
            columns='index', 
            values='prob').reset_index() 
    
    index         date     0    1          2       3       4
    0       2021-12-23  0.05    0.60    0.01    0.02    0.01
    1       2021-12-24  0.04    0.61    0.02    0.03    0.01