Search code examples
pythonpandasgroup-byiterable-unpacking

Pandas - Unpack column of lists of varying lengths of tuples


I would like to take a Pandas Dataframe named df which has an ID column and a lists column of lists that have variable number of tuples, all the tuples have the same length. Looks like this:

ID  list
1   [(0,1,2,3),(1,2,3,4),(2,3,4,NaN)]
2   [(Nan,1,2,3),(9,2,3,4)]
3   [(Nan,1,2,3),(9,2,3,4),(A,b,9,c),($,*,k,0)]

And I would like to unpack each list into columns 'A','B','C','D' representing the fixed positions in each tuple.

The result should look like:

ID  A   B   C   D
1   0   1   2   3
1   1   2   3   4
1   2   3   4   NaN
2   NaN 1   2   3
2   9   2   3   4
3   NaN 1   2   3
3   9   2   3   4
3   A   b   9   c
3   $   *   k   0

I have tried df.apply(pd.Series(list) but fails as the len of the list elements is different on different rows. Somehow need to unpack to columns and transpose by ID?


Solution

  • In [38]: (df.groupby('ID')['list']
                .apply(lambda x: pd.DataFrame(x.iloc[0], columns=['A', 'B', 'C', 'D']))
                .reset_index())
    Out[38]: 
       ID  level_1    A  B  C    D
    0   1        0    0  1  2    3
    1   1        1    1  2  3    4
    2   1        2    2  3  4  NaN
    3   2        0  NaN  1  2    3
    4   2        1    9  2  3    4
    5   3        0  NaN  1  2    3
    6   3        1    9  2  3    4
    7   3        2    A  b  9    c
    8   3        3    $  *  k    0