I have a dataframe as given below.
data = {'Participant':['A', 'B', 'B', 'B', 'B', 'C', 'C', 'D', 'D', 'D'],
'Total test Result':[1, 4, 4, 4, 4, 2, 2, 3, 3, 3],
'result' : ['negative', 'negative', 'negative', 'negative', 'negative', 'negative', 'negative', 'negative', 'negative', 'negative', ],
'time': ['2021-06-14', '2021-06-21', '2021-06-24', '2021-06-28', '2021-07-01', '2021-07-05', '2021-07-08', '2021-06-17', '2021-06-17', '2021-06-20'] }
pres_df = pd.DataFrame(data)
pres_df
Note: 'time' column is in DateTime format if it helps.
I want to create a new dataframe in which the multiple values of 'Participant' are consolidated to 1 row with the creation of multiple rows of time and result. The required final result is given below as in how it should look.
You can use pd.pivot_table
:
df.rename(columns={'time':'date'},inplace=True)
df = df.assign(test_res = 'Test' + df.groupby('Participant').cumcount().add(1).astype(str))
df1 = df.pivot_table(index=['Participant','Total test Result'],
columns=['test_res'],
values=['date','result'],
aggfunc = 'first'
)
df1.columns = df1.columns.map(lambda x: f"{x[1]}_{x[0]}" if ('Test' in x[1]) else x[0])
df1 = df1[sorted(df1.columns)].reset_index()