Sorry if this is a repeated question, but I have struggled to find an existing thread with a solution that works for my problem.
I am working with a dataset that looks something like this
df = pd.DataFrame(data={"product":["crisps", "crisps", "crisps",
"bread", "bread", "bread",
"pasta","pasta"],
"ingredients": ["potato", "oil", "salt",
"flour", "salt", "water",
"flour", "eggs"]})
But I need the dataset in this format
I know this can be done using an SQL query, but is there a way of doing this in python/pandas? The actual dataset I am working with contains over 100,000 different food products so the solution will need to be scalable.
Absolutely any advice would be much appreciated!
Try:
df.groupby('product', as_index=False)[['ingredients']].agg(list)
Output:
product ingredients
0 bread [flour, salt, water]
1 crisps [potato, oil, salt]
2 pasta [flour, eggs]