Search code examples
pythonpandasdataframe

Pandas: Combine 2 Rows If a Key Column is Empty


I have a dataframe:

Key Name Amount Condition Packing
A1 Candy A 25 Good 25 Nice
B1 Candy B 50 Good 49 Nice
Good 1 Damaged
C1 Candy C 40 Good 40 Nice

I want my dataframe to be like this: If the key column is empty, the row will be added into the nearest upper row with a key.

Key Name Amount Condition Packing
A1 Candy A 25 Good 25 Nice
B1 Candy B 50 Good,Good 49 Nice, 1 Damaged
(The entire empty row will be deleted)
C1 Candy C 40 Good 40 Nice

And then for the final, if the data is duplicate (Good,Good) I want it to be only showing 1 (Good),

Final Dataframe:

Key Name Amount Condition Packing
A1 Candy A 25 Good 25 Nice
B1 Candy B 50 Good 49 Nice, 1 Damaged
C1 Candy C 40 Good 40 Nice

Any Idea how to achieve this?

Edit:
I imported a data from a .ods file

            path = QFileDialog.getOpenFileName(self, 'Open ODS', os.getenv('HOME'), 'ODS(*.ods)')[0]
            all_data = pd.read_excel(path, skiprows=range(1, 6))
            if all_data.size == 0:
                return
            print(all_data)

and then I tried fillna()

            all_data.fillna('', inplace = True)
            self.main_screen_product_tableWidget.setRowCount(all_data.shape[0])
            self.main_screen_product_tableWidget.setColumnCount(all_data.shape[1])
            self.main_screen_product_tableWidget.setHorizontalHeaderLabels(["Key","Name",
            "Amount","Condition","Packing"])

This is how I show the data into QTableWidget

                for row in all_data.iterrows():
                    values = row[1]
                    for col_index, value in enumerate(values):
                        tableItem = QTableWidgetItem(str(value))
                        self.main_screen_product_tableWidget.setItem(row[0], 
                        col_index, tableItem)
                

Solution

  • You could use a custom aggregation function in groupby.agg:

    def custom_agg(s):
        s = s.dropna().drop_duplicates()
        if len(s) > 1:
            return ', '.join(s.astype(str))
        return s
    
    out = df.groupby(df['Key'].notna().cumsum(), as_index=False).agg(custom_agg)
    

    NB. You could also use df['Key'].ffill() in place of df['Key'].notna().cumsum() as grouper.

    If there is a possibility that you have multiple numeric values and want to handle them differently (e.g. getting the sum), you could use:

    def custom_agg(s):
        if pd.api.types.is_numeric_dtype(s):
            return s.sum()
        s = s.dropna().drop_duplicates()
        if len(s) > 1:
            return ', '.join(s.astype(str))
        return s
    

    Output:

      Key     Name  Amount Condition             Packing
    0  A1  Candy A    25.0      Good             25 Nice
    1  B1  Candy B    50.0      Good  49 Nice, 1 Damaged
    2  C1  Candy C    40.0      Good             40 Nice