I have the following nested dictionary that contains the information that a person publicly reported to the organization.
`person_json={'basicInformation': {'individualId': 5429958,
'firstName': 'BARBARA','middleName': 'JEAN','lastName': 'ABADI',
'sanctions': {'permanentBar': 'Y',
'sanctionDetails': [{'category': 'BAR',
'regulator': 'FINRA',
'messages': ['FINRA_BAR_ALL_MESSAGE'],
'detail': [],
'capacity': ['ALL']}]},
'otherNames': [],
'bcScope': 'InActive',
'iaScope': 'NotInScope',
'daysInIndustry': 3185},
'currentEmployments': [],
'currentIAEmployments': [],
'previousEmployments': [{'iaOnly': 'N',
'bdSECNumber': '44376',
'firmId': 29705,
'firmName': 'ACGM, INC.',
'street1': '590 MADISON AVENUE, 41ST FLOOR, SUITE 4103',
'city': 'NEW YORK',
'state': 'NY',
'zipCode': '10022',
'registrationBeginDate': '6/1/2011',
'registrationEndDate': '2/1/2017',
'firmBCScope': 'INACTIVE',
'firmIAScope': 'NOTINSCOPE'},
{'iaOnly': 'N',
'bdSECNumber': '49900',
'firmId': 42619,
'firmName': 'ACTIVA CAPITAL MARKETS, INC.',
'street1': '590 MADISON AVENUE',
'street2': '41ST FLOOR',
'city': 'NEW YORK',
'state': 'NY',
'zipCode': '10022',
'registrationBeginDate': '2/1/2008',
'registrationEndDate': '2/16/2011',
'firmBCScope': 'INACTIVE',
'firmIAScope': 'NOTINSCOPE'}],
'previousIAEmployments': [],
'disclosureFlag': 'Y',
'iaDisclosureFlag': 'N',
'disclosures': [{'eventDate': '4/4/2017',
'disclosureType': 'Regulatory',
'disclosureResolution': 'Final',
'isIapdExcludedCCFlag': 'N',
'isBcExcludedCCFlag': 'N',
'bcCtgryType': 11,
'disclosureDetail': {'DocketNumberFDA': '2015044587502',
'DocketNumberAAO': '2015044587502',
'Initiated By': 'FINRA',
'Allegations': 'Without admitting or denying the findings, Abadi consented to the
sanctions and to the entry of findings that she refused to produce documents and information and to appear for on-the-record testimony requested by FINRA in connection with its investigation concerning certain suspicious fund transfers involving her member firm and affiliates thereof indirectly owned and controlled by Abadi.',
'Resolution': 'Acceptance, Waiver & Consent(AWC)',
'SanctionDetails': [{'Sanctions': 'Bar (Permanent)',
'SanctionDetails': [{'Registration Capacities Affected': 'All Capacities',
'Duration': 'Indefinite',
'Start Date': '4/4/2017'}]}]}}],
'examsCount': {'stateExamCount': 1,
'principalExamCount': 1,
'productExamCount': 2},
'stateExamCategory': [{'examCategory': 'Series 63',
'examName': 'Uniform Securities Agent State Law Examination',
'examTakenDate': '3/13/2008',
'examScope': 'BC'}],
'principalExamCategory': [{'examCategory': 'Series 24',
'examName': 'General Securities Principal Examination',
'examTakenDate': '9/26/2008',
'examScope': 'BC'}],
'productExamCategory': [{'examCategory': 'SIE',
'examName': 'Securities Industry Essentials Examination',
'examTakenDate': '2/1/2017',
'examScope': 'BC'},
{'examCategory': 'Series 7',
'examName': 'General Securities Representative Examination',
'examTakenDate': '1/31/2008',
'examScope': 'BC'}],
'registrationCount': {'approvedSRORegistrationCount': 0,
'approvedFinraRegistrationCount': 0,
'approvedStateRegistrationCount': 0},
'registeredStates': [],
'registeredSROs': [],
'brokerDetails': {'hasBCComments': 'N',
'hasIAComments': 'N',
'legacyReportStatusDescription': 'Not Requested'}}`
What I try to do is to tabulate the information in the above nested dictionary. This is the code I used. The table is named 'test1'.
`test1=pds.json_normalize(person_json)`
What I got from 'test1'is the following table. I captured the photo of some part of the table.
As you may see, there are still a list in "previousEmployments", "disclosures", "stateExamCategory", "principalExamCategory", "productExamCategory", and I am trying to flatten these data and joining them together in the table format.
The expected outcome should be a dataframe/a table that contain all information from all keys and all 'sub' keys of person_json
.
How could I do this?
Thank you,
Assuming you ignore how should look like your final table/df, here is a proposition with flatten_json
/json_normalize
to make a DataFrame where the columns are the outer keys of person_json
:
#pip install flatten_json
from flatten_json import flatten
tmp = pd.json_normalize(flatten(person_json))
df = (
tmp.set_axis(tmp.columns.str.split("_", n=1, expand=True), axis=1)
.stack(1).droplevel(0)
)
Output :
print(df)
basicInformation ... stateExamCategory
NaN NaN ... NaN
0_bcCtgryType NaN ... NaN
0_bdSECNumber NaN ... NaN
... ... ... ...
sanctions_sanctionDetails_0_messages_0 FINRA_BAR_ALL_MESSAGE ... NaN
sanctions_sanctionDetails_0_regulator FINRA ... NaN
stateExamCount NaN ... NaN
[72 rows x 16 columns]
If you wanna retrieve a specific portion/slice, you can use something like :
query = df.filter(regex=".*ExamCount.*", axis=0).dropna(how="all", axis=1)
# query = df.filter(regex=".*ExamCount.*", axis=0).loc[:, "examsCount"] #variant
print(query)
examsCount
principalExamCount 1.00
productExamCount 2.00
stateExamCount 1.00