Search code examples
pythonpandasdataframesktime

Convert all rows into a Series object pandas


I have a dataframe like so:

time       0           1           2           3           4           5    
0   3.477110    3.475698    3.475874    3.478345    3.476757    3.478169    
1   3.422223    3.419752    3.417987    3.421341    3.418693    3.418340    
2   3.474110    3.474816    3.477463    3.479757    3.479581    3.476757    
3   3.504995    3.507112    3.504995    3.505877    3.507112    3.508171    
4   3.426106    3.424870    3.422399    3.421517    3.419046    3.417105    
6   3.364336    3.362571    3.360453    3.358335    3.357806    3.356924
7   3.364336    3.362571    3.360453    3.358335    3.357806    3.356924
8   3.364336    3.362571    3.360453    3.358335    3.357806    3.356924

but sktime requires the data to be in a format where each dataframe entry is a seperate time series:

3.477110,3.475698,3.475874,3.478345,3.476757,3.478169   
3.422223,3.419752,3.417987,3.421341,3.418693,3.418340   
3.474110,3.474816,3.477463,3.479757,3.479581,3.476757   
3.504995,3.507112,3.504995,3.505877,3.507112,3.508171   
3.426106,3.424870,3.422399,3.421517,3.419046,3.417105   
3.364336,3.362571,3.360453,3.358335,3.357806,3.356924

Essentially as I have 6 cols of data, each row should become a seperate series (of length 6) and the final shape should be (9, 1) (for this example) instead of the (9, 6) it is right now

I have tried iterating over the rows, using various transform techniques but to no avail, I am looking for something similar to the .squeeze() method but that works for multiple datapoints, how does one go about it?


Solution

  • I think you want something like this.

    result = df.set_index('time').apply(np.array, axis=1)
    print(result)
    print(type(result))
    print(result.shape)
    
    time
    0    [3.47711, 3.475698, 3.475874, 3.478345, 3.4767...
    1    [3.422223, 3.419752, 3.417987, 3.421341, 3.418...
    2    [3.47411, 3.474816, 3.477463, 3.479757, 3.4795...
    3    [3.504995, 3.507112, 3.504995, 3.505877, 3.507...
    4    [3.426106, 3.42487, 3.422399, 3.421517, 3.4190...
    6    [3.364336, 3.362571, 3.360453, 3.358335, 3.357...
    7    [3.364336, 3.362571, 3.360453, 3.358335, 3.357...
    8    [3.364336, 3.362571, 3.360453, 3.358335, 3.357...
    dtype: object
    <class 'pandas.core.series.Series'>
    (8,)
    

    This is one pd.Series of length 8 (in your example data index 5 is missing;) ) and each value of the Series is a np.array. You can also go with list (in the applystatement) if you want.