I have data in a column that looks like this in a pandas dataframe:
'Column name':
[{'Name': Dan Smith, 'Attribute1': 4, 'Attribute2': 10, 'Attribute3': 6}, {'Name': Bob Smith, 'Attribute1': 4, 'Attribute2': 10, 'Attribute3': 6}],
[{'Name': Shelly Smith, 'Attribute1': 4, 'Attribute2': 10, 'Attribute3': 6}, {'Name': Sam Smith, 'Attribute1': 4, 'Attribute2': 10, 'Attribute3': 6}],
{'Name': Jane Smith, 'Attribute1': 4, 'Attribute2': 10, 'Attribute3': 6},
[{'Name': Chris Smith, 'Attribute1': 4, 'Attribute2': 10, 'Attribute3': 6}, {'Name': Darryl Smith, 'Attribute1': 4, 'Attribute2': 15, 'Attribute3': 6}],
Companies are separated by [] except in instances where the company only has 1 observation (such as the 3rd observation in this example with Jane Smith). My problem is trying to parse the nested keys when the nested keys are identical. My goal is to grab the attribute with the highest value for each company.
I've tried:
df = df.explode('Column Name')
However, that does not do anything. The observations appear the same as before. After some research I tried the following
from ast import literal_eval
df['Column name'] = df['Column name'].apply(literal_eval)
df = df.explode('Column Name')
However, when I do this, I get a "KeyError:0" return. I found that this error occurs due to instances like the 3rd row where there is only 1 observation for that company. I can explode small samples of my data and grab the highest attribute and proceed according to plan. However, I have 1.62 million rows so splitting the sample into small batches is not wise.
Is there a way to pass the 'KeyError:0' exceptions? Or is there a better way to get where I'm trying to go? I'm new to Python/Pandas.
def tolist(x):
if isinstance(x, dict):
return [x]
else:
return x
df['Column name'] = df['Column name'].apply(literal_eval).apply(tolist)
df = df.explode('Column name')
To use explode, every row must be a sequence type (list
for this case). The first thing you need to do is clean up any rows where it's a single element and convert it a list of one element
[{'Name': Jane Smith, 'Attribute1': 4, 'Attribute2': 10, 'Attribute3': 6}],