Search code examples
pythonpandasdataframerollup

Roll up rows in pandas data frame?


Sorry if this is a repeated question, but I have struggled to find an existing thread with a solution that works for my problem.

I am working with a dataset that looks something like this

df = pd.DataFrame(data={"product":["crisps", "crisps", "crisps",
                                   "bread", "bread", "bread",
                                   "pasta","pasta"], 
                   "ingredients": ["potato", "oil", "salt", 
                                   "flour", "salt", "water", 
                                   "flour", "eggs"]})

Raw dataset format:
Raw dataset format

But I need the dataset in this format

Rolled up dataset:
Rolled up dataset

I know this can be done using an SQL query, but is there a way of doing this in python/pandas? The actual dataset I am working with contains over 100,000 different food products so the solution will need to be scalable.

Absolutely any advice would be much appreciated!


Solution

  • Try:

    df.groupby('product', as_index=False)[['ingredients']].agg(list)
    

    Output:

      product           ingredients
    0   bread  [flour, salt, water]
    1  crisps   [potato, oil, salt]
    2   pasta         [flour, eggs]