I've got a DataFrame called product
with a list of orders, products, and quantities for each product. Here's a screenshot:
I need to make a new DataFrame that has a row for each product name and two more columns with the sum of products ordered (basically a sum on the column quantity
per product) and the total sales for each product (sum on column total
per product).
I made this function:
products_unique = products['product_id'].unique()
names = [
products.loc[
products['product_id'] == elem
]['name'].unique()
for elem in products_unique
]
orders = [
len(products.loc[
products['product_id'] == elem
])
for elem in products_unique
]
totals = [
products.loc[
products['product_id'] == elem
]['total'].sum()
for elem in products_unique
]
chart_data = pd.DataFrame({
'Prodotti': products_unique,
'Nome': names,
'Ordini': orders,
'Totale': totals
})
Now, this function works for the purpose I had, but there is something I don't understand. When I run it, I got this:
As you can see, values in the column names
are of the type list
. Why does this happen?
And moreover, is there a cleaner way to achieve what I'm building?
Thanks to everyone who gonna help me!
chart_data = products.groupby('product_id').agg({
'name': lambda x: ', '.join(x.unique()),
'total': ['sum', 'count']
})
chart_data.columns = ['Nome', 'Totale', 'Ordini']
chart_data.reset_index(inplace=True)
chart_data.rename(columns={'product_id': 'Prodotti'}, inplace=True)