I am pulling specific data from data in nested dictionary format. As a result of selecting specific data as the first step in the dictionary format, the following list format data was obtained. The data type of each row is a list.
dataset
0 []
1 []
2 []
3 [{'A': 1, 'B': 2, 'C': 'information1'}]
4 [{'A': 3, 'B': 4, 'C': 'information2'}, {'...
type(dataset[0])
=> list
type(dataset)
=> pandas.core.series.Series
I am trying to extract specific data ('C') from here again. When using the following code for each row, I can successfully pull specific data.
[d['C'] for d in dict_test2[1]]
=> ['information1']
However, since the data is over 40k, if I create and execute the following method, I see an error message.
def get_c(d):
return [d['C'] for d in dataset]
dataset2 = dataset.apply(get_c)
=> TypeError: list indices must be integers or slices, not str
Any help would be greatly appreciated.
Since the column values are python list, you can explode the column, and get the dictionary values using Series.str['key']
then dropna
and finally call tolist()
to get the values as list:
>>> df[0].explode().str['C'].dropna().to_list()
['information1', 'information2']