Search code examples
python-3.xpandasdataframeaggregate

pandas: aggregate a column of list into one list


I have the following data frame my_df:

name         numbers
----------------------
A             [4,6]
B             [3,7,1,3]
C             [2,5]
D             [1,2,3]

I want to combine all numbers to a new list, so the output should be:

 new_numbers
---------------
[4,6,3,7,1,3,2,5,1,2,3]

And here is my code:

def combine_list(my_lists):
    new_list = []
    for x in my_lists:
        new_list.append(x)

    return new_list

new_df = my_df.agg({'numbers': combine_list})

but the new_df still looks the same as original:

              numbers
----------------------
0             [4,6]
1             [3,7,1,3]
2             [2,5]
3             [1,2,3]

What did I do wrong? How do I make new_df like:

 new_numbers
---------------
[4,6,3,7,1,3,2,5,1,2,3]

Thanks!


Solution

  • You need flatten values and then create new Dataframe by constructor:

    flatten = [item for sublist in df['numbers'] for item in sublist]
    

    Or:

    flatten = np.concatenate(df['numbers'].values).tolist()
    

    Or:

    from  itertools import chain
    
    flatten = list(chain.from_iterable(df['numbers'].values.tolist()))
    

    df1 = pd.DataFrame({'numbers':[flatten]})
    

    print (df1)
                                 numbers
    0  [4, 6, 3, 7, 1, 3, 2, 5, 1, 2, 3]
    

    Timings are here.