I have retrieved data I need in one dataframe, one of the column has this list of dict
[{'date': '2023-02-03T00:00:00', 'groups': [{'periodType': 'm',
'projections': [{'identifier': 'TD3BALMO', 'period': 'Feb 23', 'value': 54.621, 'validFrom': '2023-02-01', 'validTo': '2023-02-28', 'nextRolloverDate': '2023-02-28', 'archiveDate': '2023-02-03'},
{'identifier': 'TD3CURMON', 'period': 'Feb 23', 'value': 53.855, 'validFrom': '2023-02-01', 'validTo': '2023-02-28', 'nextRolloverDate': '2023-02-28', 'archiveDate': '2023-02-03'},
{'identifier': 'TD3+1_M', 'period': 'Mar 23', 'value': 55.387, 'validFrom': '2023-03-01', 'validTo': '2023-03-31', 'nextRolloverDate': '2023-02-28', 'archiveDate': '2023-02-03'},
{'identifier': 'TD3+2_M', 'period': 'Apr 23', 'value': 55.174, 'validFrom': '2023-04-01', 'validTo': '2023-04-28', 'nextRolloverDate': '2023-02-28', 'archiveDate': '2023-02-03'},
{'identifier': 'TD3+3_M', 'period': 'May 23', 'value': 55.748, 'validFrom': '2023-05-01', 'validTo': '2023-05-31', 'nextRolloverDate': '2023-02-28', 'archiveDate': '2023-02-03'},
{'identifier': 'TD3+4_M', 'period': 'Jun 23', 'value': 55.608, 'validFrom': '2023-06-01', 'validTo': '2023-06-30', 'nextRolloverDate': '2023-02-28', 'archiveDate': '2023-02-03'},
{'identifier': 'TD3+5_M', 'period': 'Jul 23', 'value': 52.548, 'validFrom': '2023-07-01', 'validTo': '2023-07-31', 'nextRolloverDate': '2023-02-28', 'archiveDate': '2023-02-03'}]},
{'periodType': 'q',
'projections': [{'identifier': 'TD3CURQ', 'period': 'Q1 23', 'value': 52.638, 'validFrom': '2023-01-01', 'validTo': '2023-03-31', 'nextRolloverDate': '2023-03-31', 'archiveDate': '2023-02-03'},
{'identifier': 'TD3+1Q', 'period': 'Q2 23', 'value': 55.51, 'validFrom': '2023-04-01', 'validTo': '2023-06-30', 'nextRolloverDate': '2023-03-31', 'archiveDate': '2023-02-03'},
{'identifier': 'TD3+2Q', 'period': 'Q3 23', 'value': 51.729, 'validFrom': '2023-07-01', 'validTo': '2023-09-29', 'nextRolloverDate': '2023-03-31', 'archiveDate': '2023-02-03'},
{'identifier': 'TD3+3Q', 'period': 'Q4 23', 'value': 62.63, 'validFrom': '2023-10-01', 'validTo': '2023-12-22', 'nextRolloverDate': '2023-03-31', 'archiveDate': '2023-02-03'}]
}
]
}]
What's the easiest way to convert it to below? (sorry the numbers are not the same but you get the idea) I tried json_normalize but haven't found an efficient way to convert to below? in fact I only need data from the first 3 columns: identifier, period, value
identifier period value ... validTo nextRolloverDate archiveDate
0 TD3BALMO Feb 23 68.464 ... 2023-02-28 2023-02-28 2023-02-21
1 TD3CURMON Feb 23 60.955 ... 2023-02-28 2023-02-28 2023-02-21
2 TD3+1_M Mar 23 67.128 ... 2023-03-31 2023-02-28 2023-02-21
3 TD3+2_M Apr 23 63.499 ... 2023-04-28 2023-02-28 2023-02-21
4 TD3+3_M May 23 59.734 ... 2023-05-31 2023-02-28 2023-02-21
Assuming you already loaded the JSON into python object data
:
from itertools import chain
import pandas as pd
data = [{'date': '2023-02-03T00:00:00', 'groups': [{'periodType': 'm',
'projections': [{'identifier': 'TD3BALMO', 'period': 'Feb 23', 'value': 54.621, 'validFrom': '2023-02-01', 'validTo': '2023-02-28', 'nextRolloverDate': '2023-02-28', 'archiveDate': '2023-02-03'},
{'identifier': 'TD3CURMON', 'period': 'Feb 23', 'value': 53.855, 'validFrom': '2023-02-01', 'validTo': '2023-02-28', 'nextRolloverDate': '2023-02-28', 'archiveDate': '2023-02-03'},
{'identifier': 'TD3+1_M', 'period': 'Mar 23', 'value': 55.387, 'validFrom': '2023-03-01', 'validTo': '2023-03-31', 'nextRolloverDate': '2023-02-28', 'archiveDate': '2023-02-03'},
{'identifier': 'TD3+2_M', 'period': 'Apr 23', 'value': 55.174, 'validFrom': '2023-04-01', 'validTo': '2023-04-28', 'nextRolloverDate': '2023-02-28', 'archiveDate': '2023-02-03'},
{'identifier': 'TD3+3_M', 'period': 'May 23', 'value': 55.748, 'validFrom': '2023-05-01', 'validTo': '2023-05-31', 'nextRolloverDate': '2023-02-28', 'archiveDate': '2023-02-03'},
{'identifier': 'TD3+4_M', 'period': 'Jun 23', 'value': 55.608, 'validFrom': '2023-06-01', 'validTo': '2023-06-30', 'nextRolloverDate': '2023-02-28', 'archiveDate': '2023-02-03'},
{'identifier': 'TD3+5_M', 'period': 'Jul 23', 'value': 52.548, 'validFrom': '2023-07-01', 'validTo': '2023-07-31', 'nextRolloverDate': '2023-02-28', 'archiveDate': '2023-02-03'}]},
{'periodType': 'q',
'projections': [{'identifier': 'TD3CURQ', 'period': 'Q1 23', 'value': 52.638, 'validFrom': '2023-01-01', 'validTo': '2023-03-31', 'nextRolloverDate': '2023-03-31', 'archiveDate': '2023-02-03'},
{'identifier': 'TD3+1Q', 'period': 'Q2 23', 'value': 55.51, 'validFrom': '2023-04-01', 'validTo': '2023-06-30', 'nextRolloverDate': '2023-03-31', 'archiveDate': '2023-02-03'},
{'identifier': 'TD3+2Q', 'period': 'Q3 23', 'value': 51.729, 'validFrom': '2023-07-01', 'validTo': '2023-09-29', 'nextRolloverDate': '2023-03-31', 'archiveDate': '2023-02-03'},
{'identifier': 'TD3+3Q', 'period': 'Q4 23', 'value': 62.63, 'validFrom': '2023-10-01', 'validTo': '2023-12-22', 'nextRolloverDate': '2023-03-31', 'archiveDate': '2023-02-03'}]
}
]
}]
proj = [item.get('projections', []) for dt in data for item in dt['groups']]
df = pd.DataFrame(chain(*proj))
print(df)
output
identifier period value validFrom validTo nextRolloverDate archiveDate
0 TD3BALMO Feb 23 54.621 2023-02-01 2023-02-28 2023-02-28 2023-02-03
1 TD3CURMON Feb 23 53.855 2023-02-01 2023-02-28 2023-02-28 2023-02-03
2 TD3+1_M Mar 23 55.387 2023-03-01 2023-03-31 2023-02-28 2023-02-03
3 TD3+2_M Apr 23 55.174 2023-04-01 2023-04-28 2023-02-28 2023-02-03
4 TD3+3_M May 23 55.748 2023-05-01 2023-05-31 2023-02-28 2023-02-03
5 TD3+4_M Jun 23 55.608 2023-06-01 2023-06-30 2023-02-28 2023-02-03
6 TD3+5_M Jul 23 52.548 2023-07-01 2023-07-31 2023-02-28 2023-02-03
7 TD3CURQ Q1 23 52.638 2023-01-01 2023-03-31 2023-03-31 2023-02-03
8 TD3+1Q Q2 23 55.510 2023-04-01 2023-06-30 2023-03-31 2023-02-03
9 TD3+2Q Q3 23 51.729 2023-07-01 2023-09-29 2023-03-31 2023-02-03
10 TD3+3Q Q4 23 62.630 2023-10-01 2023-12-22 2023-03-31 2023-02-03