Search code examples
pythonpandasdataframeaccelerometer

Best way to take the values of a column that are all related to multiple rows with the same index


I'm currently try to take accelerometer data samples and combine them into a single dataframe for use with machine learning. There are 100 entries per sample saved to csv's and each gets loaded in through a for loop and the merged through a for loop. After that my results look something like this:

ID X Y Z Class
ID 1 4 2 6 Action 1
ID 1 4 2 5 Action 1
ID 2 3 4 4 Action 2
ID 2 2 2 1 Action 2
ID 3 1 4 5 Action 3
ID 3 2 3 2 Action 3
ID 4 2 1 5 Action 1
ID 4 2 2 5 Action 1

I'm wanting to get a wide format table to make the data easier to use with machine learning. I'm trying to get it to look more along the lines of this second table:

ID X1 Y1 Z1 X2 Y2 Z2 Class
ID 1 4 2 6 4 2 5 Action 1
ID 2 3 4 4 2 2 1 Action 2
ID 3 1 4 5 2 3 2 Action 3
ID 4 2 1 5 2 2 5 Action 1

I've looked into melts, stacking and unstacking, groupby options, and for loops where I try to move everything over. Nothing has gotten me close to what I've been trying to accomplish. Just looking for advice on how to proceed or if there is a better method.


Solution

  • Try:

    df['tmp'] = df.groupby('ID').cumcount() + 1
    
    df = df.pivot(index=['ID', 'Class'], columns=['tmp'])
    df.columns = [f'{a}{b}' for a, b in df.columns]
    df = df.reset_index()
    
    print(df)
    

    Prints:

         ID     Class  X1  X2  Y1  Y2  Z1  Z2
    0  ID 1  Action 1   4   4   2   2   6   5
    1  ID 2  Action 2   3   2   4   2   4   1
    2  ID 3  Action 3   1   2   4   3   5   2
    3  ID 4  Action 1   2   2   1   2   5   5