Search code examples
pythonpandasdataframedata-sciencedata-munging

Pandas how to explode several items of list for each new row


I have a dataframe:

c1.  c2.  c3.  l
1.   2.   3    [1,2,3,4,5,6,7]
3.   4.   8.   [8,9,0]

I want explode it such that every 3 elements from each list in the column l will be a new row, and the column for the triplet index within the original list. So I will get:

c1.  c2.  c3.  l          idx
1.   2.   3    [1,2,3].    0
1.   2.   3.   [4,5,6].    1
3.   4.   8.   [8,9,0].    0

What is the best way to do so?


Solution

  • Break list element into chunks first and then explode:

    df.l = df.l.apply(lambda lst: [lst[3*i:3*(i+1)] for i in range(len(lst) // 3)])
    
    df    
    #   c1  c2  c3                       l
    #0   1   2   3  [[1, 2, 3], [4, 5, 6]]
    #1   3   4   8             [[8, 9, 0]]
    
    df.explode('l')
    #   c1  c2  c3          l
    #0   1   2   3  [1, 2, 3]
    #0   1   2   3  [4, 5, 6]
    #1   3   4   8  [8, 9, 0]
    

    If you need the index column:

    # store index as second element of the tuple
    df.l = df.l.apply(lambda lst: [(lst[3*i:3*(i+1)], i) for i in range(len(lst) // 3)])
    
    df    
    #   c1  c2  c3                                 l
    #0   1   2   3  [([1, 2, 3], 0), ([4, 5, 6], 1)]
    #1   3   4   8                  [([8, 9, 0], 0)]
    
    df = df.explode('l')
    df
    #   c1  c2  c3               l
    #0   1   2   3  ([1, 2, 3], 0)
    #0   1   2   3  ([4, 5, 6], 1)
    #1   3   4   8  ([8, 9, 0], 0)
    
    # extract list and index from the tuple column
    df['l'], df['idx'] = df.l.str[0], df.l.str[1]
    df
    #   c1  c2  c3          l  idx
    #0   1   2   3  [1, 2, 3]    0
    #0   1   2   3  [4, 5, 6]    1
    #1   3   4   8  [8, 9, 0]    0