I want to create a list from a Pandas column formatted like so:
data = {
'col': [
'[[a, b, c], [a,b]]',
'[[a, b], [c]]',
'[[x]]'
]
}
df = pd.DataFrame(data)
The result should be something like this:
python_list = [[['a', 'b', 'c'], ['a', 'b'],
[['a', 'b'], ['c']],
[['x']]]
So basically a List[List[List[Str]]]. Doing it with a non-nested list would be something simple like df.str.split().tolist()
, but I have no clue how to do it for this case. Thanks a lot in advance!
You can try applying a custom function like this:
def parse_list(s: str):
q = []
for c in s:
if c == '[': # indicate beginning of a list
q.append(c)
elif c == ']': # indicate end of a list
e = [] # create that list
while True:
b = q.pop() # pop the queue until we see the beginning
if b == '[':
break
e.append(b) # append the element to the list
q.append(e[::-1]) # reverse the list and add back to queue
elif c == ',' or c == ' ': # these does nothing
# do nothing
continue
else:
q.append(c) # element, may need to modify this a bit
return q[0]
out = df['col'].apply(parse_list).to_list()
Output:
[[['a', 'b', 'c'], ['a', 'b']], [['a', 'b'], ['c']], [['x']]]
Note: this only works for singleton strings (a,b,c,...). One needs to modify it to work with general strings, where one needs to work out the role of ,