Search code examples
pythonpandasdictionarypandas-explode

dataframe: parse a column containing list of dicts: Traceback ValueError: cannot reindex on an axis with duplicate labels


I have one column(called 'data') in a dataframe which looks like this, each row has a list of dicts, starting with 2022-01-04, ended today, for example the 1st row is {'value': 18.76, 'date': '2022-01-04'}, {'value': 18.59, 'date': '2022-01-05'}, {'value': 18.99, 'date': '2022-01-06'}...

0     [{'value': 18.76, 'date': '2022-01-04'}, {'val...
1     [{'value': 38.58, 'date': '2022-01-04'}, {'val...
2     [{'value': 37.5, 'date': '2022-01-04'}, {'valu...
3     [{'value': 61.77, 'date': '2022-01-04'}, {'val...
4     [{'value': 110.54, 'date': '2022-01-04'}, {'va...
5     [{'value': 101.71, 'date': '2022-01-04'}, {'va...
6     [{'value': 86.45, 'date': '2022-01-04'}, {'val...
7     [{'value': 97.95, 'date': '2022-01-04'}, {'val...
8     [{'value': 38.39, 'date': '2022-01-04'}, {'val...
9     [{'value': 217.92, 'date': '2022-01-04'}, {'va...
10    [{'value': 86.94, 'date': '2022-01-04'}, {'val...
11    [{'value': 55.2, 'date': '2022-01-04'}, {'valu...
12    [{'value': 138.97, 'date': '2022-01-04'}, {'va...
13    [{'value': 4853125.0, 'date': '2022-01-04'}, {...
14    [{'value': 29.12, 'date': '2022-01-04'}, {'val...
15    [{'value': 90.77, 'date': '2022-01-04'}, {'val...
16    [{'value': 87.15, 'date': '2022-01-04'}, {'val...

I used a line of code which worked before

df[['date','value']] = df['data'].apply(lambda x: [[i['date'],i['value']] for i in x]).explode().apply(pd.Series, index=['date','value'])

but now this line fails and gives

ValueError: cannot reindex on an axis with duplicate labels

Is there an easy solution to solve this issue? as there are 300 dates hence 300 data points for each row, I am not sure which dates may contain duplicate data??


Solution

  • You can try:

    df = df.explode("Column1")
    df = pd.concat([df, df.pop("Column1").apply(pd.Series)], axis=1)
    
    print(df)
    

    Prints:

             value        date
    0        18.76  2022-01-04
    0        18.59  2022-01-05
    0        18.99  2022-01-06
    1        38.58  2022-01-04
    2        37.50  2022-01-04
    
    ...and so on.