Search code examples
pythonjsonpandaslinked-listtree

Create tree like data structure in JSON format from Pandas Data frames using python


I have 3 different dataframes that represents layers of geo boundaries of cities, counties, and states respectively.

Df1 (Polygon column is for the city boundaries)

City ID |  City  | County ID |  County  | State ID |  State  | geometry

12345               123                                                 12         POLYGON (0.0,…..)

Df 2 (Polygon column is for the county boundaries)

County ID |  County  | State ID |  State  | geometry

475                                                 47         POLYGON (0.0, …..)

Df 3 (Polygon column is for the State boundaries)

State ID |  State  | geometry

25                  POLYGON (0.0, …..)

I want to create a tree like data structure in JSON format that looks like the following :

State 1 (ID, Name, Polygon(…))

├── County 1 (ID, Name, Polygon(…))

    └── City 1 (ID, Name, Polygon(…))

└── County 2

    └── City 1

    └── City 2

    └── City 3

State 2

├── County 1

    └── City 1

└── County 2

    └── City 1

Etc …

The goal is to be able to search any level of hierarchy and display the corresponding polygone , whether its of the city , the county or the state

What is the way to proceed? How to create nested values and structure the JSON format starting from pandas dataframe.

Is tree like data structure is the right one ? Should I create an adjacency list ? How ?

Many thanx


Solution

  • A tree like data structure can be created in JSON format using following code with Pandas:

    import pandas as pd
    import json
    
    # Example DataFrames
    df1 = pd.DataFrame({
        'City ID': [12345],
        'City': ['City 1'],
        'County ID': [123],
        'County': ['County 1'],
        'State ID': [12],
        'State': ['State 1'],
        'geometry': ['POLYGON (0.0, …)']
    })
    
    df2 = pd.DataFrame({
        'County ID': [123],
        'County': ['County 1'],
        'State ID': [12],
        'State': ['State 1'],
        'geometry': ['POLYGON (0.0, …)']
    })
    
    df3 = pd.DataFrame({
        'State ID': [12],
        'State': ['State 1'],
        'geometry': ['POLYGON (0.0, …)']
    })
    
    # Step 1: Merge DataFrames
    df_merged = df1.merge(df2, on=['County ID', 'State ID', 'State'], how='left').merge(df3, on=['State ID', 'State'], how='left')
    
    # Step 2: Build nested dictionary representing the tree structure
    tree_dict = {}
    
    for _, row in df_merged.iterrows():
        state_id = row['State ID']
        state_name = row['State']
        county_id = row['County ID']
        county_name = row['County']
        city_id = row['City ID']
        city_name = row['City']
        polygon = row['geometry']
        
        # Add state to the tree dictionary if it doesn't exist
        if state_id not in tree_dict:
            tree_dict[state_id] = {
                'ID': state_id,
                'Name': state_name,
                'Polygon': polygon,
                'Counties': {}
            }
        
        # Add county to the tree dictionary if it doesn't exist
        if county_id not in tree_dict[state_id]['Counties']:
            tree_dict[state_id]['Counties'][county_id] = {
                'ID': county_id,
                'Name': county_name,
                'Polygon': polygon,
                'Cities': {}
            }
        
        # Add city to the tree dictionary
        tree_dict[state_id]['Counties'][county_id]['Cities'][city_id] = {
            'ID': city_id,
            'Name': city_name,
            'Polygon': polygon
        }
    
    # Step 3: Convert nested dictionary to JSON format
    json_data = json.dumps(tree_dict, indent=4)
    
    # Print the JSON data
    print(json_data)
    

    This code merges the given DataFrames based on the shared ID columns (City ID, County ID, State ID) to create a unified DataFrame. It then builds a nested dictionary structure representing the tree hierarchy, starting from the state level down to the city level. Finally, it converts the nested dictionary into JSON format using the json.dumps() function.

    The resulting JSON structure follows the desired tree-like format, where states have nested counties, and counties have nested cities. Each level of the tree contains the relevant ID, Name, and Polygon information.