I have a dataframe:
Key | Name | Amount | Condition | Packing |
---|---|---|---|---|
A1 | Candy A | 25 | Good | 25 Nice |
B1 | Candy B | 50 | Good | 49 Nice |
Good | 1 Damaged | |||
C1 | Candy C | 40 | Good | 40 Nice |
I want my dataframe to be like this: If the key column is empty, the row will be added into the nearest upper row with a key.
Key | Name | Amount | Condition | Packing |
---|---|---|---|---|
A1 | Candy A | 25 | Good | 25 Nice |
B1 | Candy B | 50 | Good,Good | 49 Nice, 1 Damaged |
(The entire empty row will be deleted) | ||||
C1 | Candy C | 40 | Good | 40 Nice |
And then for the final, if the data is duplicate (Good,Good) I want it to be only showing 1 (Good),
Final Dataframe:
Key | Name | Amount | Condition | Packing |
---|---|---|---|---|
A1 | Candy A | 25 | Good | 25 Nice |
B1 | Candy B | 50 | Good | 49 Nice, 1 Damaged |
C1 | Candy C | 40 | Good | 40 Nice |
Any Idea how to achieve this?
Edit:
I imported a data from a .ods file
path = QFileDialog.getOpenFileName(self, 'Open ODS', os.getenv('HOME'), 'ODS(*.ods)')[0]
all_data = pd.read_excel(path, skiprows=range(1, 6))
if all_data.size == 0:
return
print(all_data)
and then I tried fillna()
all_data.fillna('', inplace = True)
self.main_screen_product_tableWidget.setRowCount(all_data.shape[0])
self.main_screen_product_tableWidget.setColumnCount(all_data.shape[1])
self.main_screen_product_tableWidget.setHorizontalHeaderLabels(["Key","Name",
"Amount","Condition","Packing"])
This is how I show the data into QTableWidget
for row in all_data.iterrows():
values = row[1]
for col_index, value in enumerate(values):
tableItem = QTableWidgetItem(str(value))
self.main_screen_product_tableWidget.setItem(row[0],
col_index, tableItem)
You could use a custom aggregation function in groupby.agg
:
def custom_agg(s):
s = s.dropna().drop_duplicates()
if len(s) > 1:
return ', '.join(s.astype(str))
return s
out = df.groupby(df['Key'].notna().cumsum(), as_index=False).agg(custom_agg)
NB. You could also use df['Key'].ffill()
in place of df['Key'].notna().cumsum()
as grouper.
If there is a possibility that you have multiple numeric values and want to handle them differently (e.g. getting the sum), you could use:
def custom_agg(s):
if pd.api.types.is_numeric_dtype(s):
return s.sum()
s = s.dropna().drop_duplicates()
if len(s) > 1:
return ', '.join(s.astype(str))
return s
Output:
Key Name Amount Condition Packing
0 A1 Candy A 25.0 Good 25 Nice
1 B1 Candy B 50.0 Good 49 Nice, 1 Damaged
2 C1 Candy C 40.0 Good 40 Nice