Search code examples
pythonjsonpandasdataframenested

How to load a nested json file into a pandas DataFrame


please help I cannot seem to get the json data into a Dataframe.

loaded the data

data =json.load(open(r'path'))#this works fine and displays:

json data

{'type': 'FeatureCollection', 'name': 'Altstadt Nord', 'crs': {'type': 'name', 'properties': {'name': 'urn:ogc:def:crs:OGC:1.3:CRS84'}}, 'features': [{'type': 'Feature', 'properties': {'Name': 'City-Martinsviertel', 'description': None}, 'geometry': {'type': 'Polygon', 'coordinates': [[[6.9595637, 50.9418396], [6.956624, 50.9417382], [6.9543173, 50.941603], [6.9529869, 50.9413664], [6.953062, 50.9408593], [6.9532873, 50.9396289], [6.9533624, 50.9388176], [6.9529333, 50.9378373], [6.9527509, 50.9371815], [6.9528367, 50.9360659], [6.9532122, 50.9352884], [6.9540705, 50.9350653], [6.9553258, 50.9350044], [6.9568815, 50.9351667], [6.9602074, 50.9355047], [6.9608189, 50.9349165], [6.9633939, 50.9348827], [6.9629433, 50.9410622], [6.9616236, 50.9412176], [6.9603898, 50.9414881], [6.9595637, 50.9418396]]]}}, {'type': 'Feature', 'properties': {'Name': 'Gereonsviertel', 'description': None}, 'geometry': {'type': 'Polygon', 'coordinates': [[[6.9629433, 50.9410622], [6.9629433, 50.9431646], [6.9611408, 50.9433539], [6.9601752, 50.9436649], [6.9588234, 50.9443409], [6.9579651, 50.9449763], [6.9573213, 50.945801], [6.9563128, 50.9451926], [6.9551756, 50.9448546], [6.9535663, 50.9446518], [6.9523432, 50.9449763], [6.9494464, 50.9452602], [6.9473435, 50.9454495], [6.9466998, 50.9456928], [6.9458415, 50.946531], [6.9434168, 50.9453954], [6.9424726, 50.9451926], [6.9404342, 50.9429888], [6.9404771, 50.9425156], [6.9403269, 50.9415016], [6.9400479, 50.9405281], [6.9426228, 50.9399872], [6.9439103, 50.9400143], [6.9453051, 50.9404875], [6.9461634, 50.9408931], [6.9467427, 50.941096], [6.9475581, 50.9410013], [6.9504227, 50.9413191], [6.9529869, 50.9413664], [6.9547464, 50.9416368], [6.9595637, 50.9418396], [6.9603898, 50.9414881], [6.9616236, 50.9412176], [6.9629433, 50.9410622]]]}}, {'type': 'Feature', 'properties': {'Name': 'Kunibertsviertel', 'description': None}, 'geometry': {'type': 'Polygon', 'coordinates': [[[6.9629433, 50.9431646], [6.9637129, 50.9454917], [6.9651506, 50.9479252], [6.9666097, 50.9499124], [6.9667599, 50.9500882], [6.9587777, 50.9502504], [6.9573213, 50.945801], [6.9579651, 50.9449763], [6.9588234, 50.9443409], [6.9601752, 50.9436649], [6.9611408, 50.9433539], [6.9629433, 50.9431646]]]}}, {'type': 'Feature', 'properties': {'Name': 'Nördlich Neumarkt', 'description': None}, 'geometry': {'type': 'Polygon', 'coordinates': [[[6.9390331, 50.9364418], [6.9417153, 50.9358738], [6.9462214, 50.9358062], [6.9490109, 50.9355628], [6.9505129, 50.9353329], [6.9523798, 50.9352924], [6.9532122, 50.9352884], [6.9528367, 50.9360659], [6.9527509, 50.9371815], [6.9529333, 50.9378373], [6.9533624, 50.9388176], [6.9532381, 50.9398222], [6.9529869, 50.9413664], [6.9504227, 50.9413191], [6.9475581, 50.9410013], [6.9467427, 50.941096], [6.9453051, 50.9404875], [6.9439103, 50.9400143], [6.9424663, 50.9399574], [6.9400479, 50.9405281], [6.9390331, 50.9364418]]]}}]}

now i cannot seem to fit it into a Dataframe // pd.DataFrame(data) --> ValueError: Mixing dicts with non-Series may lead to ambiguous ordering.full error

I tried to flatten with json_flatten but ModuleNotFoundError: No module named 'flatten_json' even though I installed json-flatten via pip install

also tried df =pd.DataFrame.from_dict(data,orient='index')

df
Out[22]: 
                                                          0
type                                      FeatureCollection
name                                          Altstadt Nord
crs       {'type': 'name', 'properties': {'name': 'urn:o...
features  [{'type': 'Feature', 'properties': {'Name': 'C...

df Out[22]


Solution

  • I think you can use json_normalize to load them to pandas.

    test.json in this case is your full json file (with double quotes).

    
    import json
    from pandas.io.json import json_normalize
    
    with open('path_to_json.json') as f:
      data = json.load(f)
      df = json_normalize(data, record_path=['features'], meta=['name']) 
    
    print(df)
    
    

    This results in a dataframe as shown below. Output Dataframe

    You can further add record field in the normalize method to create more columns for the polygon coordinates.

    You can find more documentation at https://pandas.pydata.org/pandas-docs/version/1.2.0/reference/api/pandas.json_normalize.html

    Hope that helps.