Search code examples
pythonpython-3.xpandasdataframedata-preprocessing

Create a DataFrame from present Dataframe with multiple conditions


I have a dataframe as given below.

data = {'Participant':['A', 'B', 'B', 'B', 'B', 'C', 'C', 'D', 'D', 'D'],
    'Total test Result':[1, 4, 4, 4, 4, 2, 2, 3, 3, 3], 
    'result' : ['negative', 'negative', 'negative', 'negative', 'negative', 'negative', 'negative', 'negative', 'negative', 'negative', ], 
    'time': ['2021-06-14', '2021-06-21', '2021-06-24', '2021-06-28', '2021-07-01', '2021-07-05', '2021-07-08', '2021-06-17', '2021-06-17', '2021-06-20'] }
pres_df = pd.DataFrame(data)
pres_df

Note: 'time' column is in DateTime format if it helps.

enter image description here

I want to create a new dataframe in which the multiple values of 'Participant' are consolidated to 1 row with the creation of multiple rows of time and result. The required final result is given below as in how it should look.

enter image description here Any help is greatly appreciated. Thanks.


Solution

  • You can use pd.pivot_table:

    df.rename(columns={'time':'date'},inplace=True)
    df = df.assign(test_res = 'Test' + df.groupby('Participant').cumcount().add(1).astype(str))
    df1 = df.pivot_table(index=['Participant','Total test Result'], 
                                          columns=['test_res'],
                                          values=['date','result'],
                                          aggfunc = 'first'
                                          )
    df1.columns = df1.columns.map(lambda x: f"{x[1]}_{x[0]}" if ('Test' in x[1]) else x[0])
    df1 = df1[sorted(df1.columns)].reset_index()
    

    df1: enter image description here