Search code examples
pythonpandaslist

Pandas list of list column to Python list


I want to create a list from a Pandas column formatted like so:

data = {
    'col': [
        '[[a, b, c], [a,b]]',
        '[[a, b], [c]]',
        '[[x]]'
    ]
}

df = pd.DataFrame(data)

The result should be something like this:

python_list = [[['a', 'b', 'c'], ['a', 'b'],
 [['a', 'b'], ['c']],
 [['x']]]

So basically a List[List[List[Str]]]. Doing it with a non-nested list would be something simple like df.str.split().tolist(), but I have no clue how to do it for this case. Thanks a lot in advance!


Solution

  • You can try applying a custom function like this:

    def parse_list(s: str):
        q = []
        for c in s:
            if c == '[':    # indicate beginning of a list
                q.append(c)
            elif c == ']':  # indicate end of a list
                e = []      # create that list
                while True:    
                    b = q.pop()  # pop the queue until we see the beginning  
                    if b == '[':
                        break
                    e.append(b)  # append the element to the list
                q.append(e[::-1]) # reverse the list and add back to queue
            elif c == ',' or c == ' ': # these does nothing
                # do nothing
                continue
            else:
                q.append(c)            # element, may need to modify this a bit
                    
        return q[0]
    
    out = df['col'].apply(parse_list).to_list()
    

    Output:

    [[['a', 'b', 'c'], ['a', 'b']], [['a', 'b'], ['c']], [['x']]]
    

    Note: this only works for singleton strings (a,b,c,...). One needs to modify it to work with general strings, where one needs to work out the role of ,