I had a JSON file that was heavily nested that I normalized into a Pandas dataframe. The result was that all of the keys were turned into columns and the values into rows. The problem is it appears I have some columns that still contain dicts and were not properly normalized. The dataframe has over 8000 rows and 3000 columns so this can't be done by hand.
Here is the JSON file: https://justpaste.it/9nfke
For example:
I have the following column named:
Return.ReturnData.IRS990PF.AnalysisIncomeProducingActyGrp.OtherRevenueDescribedGrp
which contains dicts, like the following:
[{'Desc': 'MISCELLANEOUS', 'ExclusionCd': '01', 'ExclusionAmt': '13'}, {'Desc': 'GRANT REFUNDS', 'RelatedOrExemptFunctionIncmAmt': '159502'}]
As you can see there's a Desc
ExclusionCd
ExclusionAmt
etc.
I already have populated columns in my dataframes named for these:
Return.ReturnData.IRS990PF.AnalysisIncomeProducingActyGrp.OtherRevenueDescribedGrp.BusinessCd Return.ReturnData.IRS990PF.AnalysisIncomeProducingActyGrp.OtherRevenueDescribedGrp.Desc Return.ReturnData.IRS990PF.AnalysisIncomeProducingActyGrp.OtherRevenueDescribedGrp.ExclusionAmt Return.ReturnData.IRS990PF.AnalysisIncomeProducingActyGrp.OtherRevenueDescribedGrp.ExclusionCd Return.ReturnData.IRS990PF.AnalysisIncomeProducingActyGrp.OtherRevenueDescribedGrp.RelatedOrExemptFunctionIncmAmt Return.ReturnData.IRS990PF.AnalysisIncomeProducingActyGrp.OtherRevenueDescribedGrp.UnrelatedBusinessTaxblIncmAmt
How can I move those values into their respective columns? Keep in mind that I probably have hundreds of these and I can't do it by hand. It needs to be automated. Thanks!
pd.json_normalize
.
dict
s as values in a row, when you run this locally.Return.ReturnData.IRS990PF
is not available in the PasteBin data sample.
IRS990
keys, only IRS990
and IRS990Schedule...
are available.meta
parameter of json_normalize
import pandas as pd
import json
from pathlib import Path
# path to file
p = Path('c:\some_path\test.json')
# read in the JSON file
with p.open('r', encoding='utf-8') as f:
data = json.loads(f.read())
# parse with pandas
df = pd.json_normalize(data)
# if there's a list of dictionaries in the resulting dataframe, they can be unpacked with something like
df['Return.ReturnData.IRS990ScheduleO.SupplementalInformationDetail'].apply(pd.json_normalize)