Good afternoon, I hope that you are well.
I have an xlsx file in the following format that which is output from a Python function I have been using to parse data :-
I have loaded this xlsx file into a pandas df in an attempt to achieve the following output :-
The requirements that I am trying to satisfy are :- For each row in the dataframe, if the the "Application ID" and "Test Phase" column values match, then I would like to add the row values for those column values together and replace the original matched rows with one row containing the summed values.
Where there is match in the column values, the original row should remain in place.
If there any pointers on how to achieve this, it would be much appreciated. I have attempted to achieve this code in the function prior to writing the values to the source xlsx output file however I assumed it would be easier to achieve by working with pandas / numpy.
Many thanks in advance Jimmy
Use groupby_sum
:
out = df.groupby(['Application ID', 'Test Phase'], as_index=False).sum()
print(out)
# Output
Application ID Test Phase Total Tests A
0 9 SIT 36 36
1 11 UAT 5 5
Setup:
data = {'Application ID': [9, 9, 11],
'Test Phase': ['SIT', 'SIT', 'UAT'],
'Total Tests': [9, 27, 5],
'A': [9, 27, 5]}
df = pd.DataFrame(data)
print(df)
# Output
Application ID Test Phase Total Tests A
0 9 SIT 9 9
1 9 SIT 27 27
2 11 UAT 5 5