Search code examples
pythonpandaslistdictionarynested

Extracting specific data from list format data extracted from dictionary format read from Excel file


I am pulling specific data from data in nested dictionary format. As a result of selecting specific data as the first step in the dictionary format, the following list format data was obtained. The data type of each row is a list.

dataset 

0                                                      []
1                                                      []
2                                                      []
3                [{'A': 1, 'B': 2, 'C': 'information1'}]
4       [{'A': 3, 'B': 4, 'C': 'information2'}, {'...

type(dataset[0])
=> list
type(dataset)
=> pandas.core.series.Series

I am trying to extract specific data ('C') from here again. When using the following code for each row, I can successfully pull specific data.

[d['C'] for d in dict_test2[1]]
=> ['information1']

However, since the data is over 40k, if I create and execute the following method, I see an error message.

def get_c(d):
    return [d['C'] for d in dataset]
dataset2 = dataset.apply(get_c)

=> TypeError: list indices must be integers or slices, not str

Any help would be greatly appreciated.


Solution

  • Since the column values are python list, you can explode the column, and get the dictionary values using Series.str['key'] then dropna and finally call tolist() to get the values as list:

    >>> df[0].explode().str['C'].dropna().to_list()
    
    ['information1', 'information2']