Search code examples
pythonpandasdataframemergemultiple-conditions

Python add / merge rows of a dataframe together based on multiple conditions


Good afternoon, I hope that you are well.

I have an xlsx file in the following format that which is output from a Python function I have been using to parse data :-

enter image description here

I have loaded this xlsx file into a pandas df in an attempt to achieve the following output :-

enter image description here

The requirements that I am trying to satisfy are :- For each row in the dataframe, if the the "Application ID" and "Test Phase" column values match, then I would like to add the row values for those column values together and replace the original matched rows with one row containing the summed values.

Where there is match in the column values, the original row should remain in place.

If there any pointers on how to achieve this, it would be much appreciated. I have attempted to achieve this code in the function prior to writing the values to the source xlsx output file however I assumed it would be easier to achieve by working with pandas / numpy.

Many thanks in advance Jimmy


Solution

  • Use groupby_sum:

    out = df.groupby(['Application ID', 'Test Phase'], as_index=False).sum()
    print(out)
    
    # Output
       Application ID Test Phase  Total Tests   A
    0               9        SIT           36  36
    1              11        UAT            5   5
    

    Setup:

    data = {'Application ID': [9, 9, 11],
            'Test Phase': ['SIT', 'SIT', 'UAT'],
            'Total Tests': [9, 27, 5],
            'A': [9, 27, 5]}
    df = pd.DataFrame(data)
    print(df)
    
    # Output
       Application ID Test Phase  Total Tests   A
    0               9        SIT            9   9
    1               9        SIT           27  27
    2              11        UAT            5   5