I have 3 different dataframes that represents layers of geo boundaries of cities, counties, and states respectively.
Df1 (Polygon column is for the city boundaries)
City ID | City | County ID | County | State ID | State | geometry
12345 123 12 POLYGON (0.0,…..)
Df 2 (Polygon column is for the county boundaries)
County ID | County | State ID | State | geometry
475 47 POLYGON (0.0, …..)
Df 3 (Polygon column is for the State boundaries)
State ID | State | geometry
25 POLYGON (0.0, …..)
I want to create a tree like data structure in JSON format that looks like the following :
State 1 (ID, Name, Polygon(…))
├── County 1 (ID, Name, Polygon(…))
└── City 1 (ID, Name, Polygon(…))
└── County 2
└── City 1
└── City 2
└── City 3
State 2
├── County 1
└── City 1
└── County 2
└── City 1
Etc …
The goal is to be able to search any level of hierarchy and display the corresponding polygone , whether its of the city , the county or the state
What is the way to proceed? How to create nested values and structure the JSON format starting from pandas dataframe.
Is tree like data structure is the right one ? Should I create an adjacency list ? How ?
Many thanx
A tree like data structure can be created in JSON format using following code with Pandas:
import pandas as pd
import json
# Example DataFrames
df1 = pd.DataFrame({
'City ID': [12345],
'City': ['City 1'],
'County ID': [123],
'County': ['County 1'],
'State ID': [12],
'State': ['State 1'],
'geometry': ['POLYGON (0.0, …)']
})
df2 = pd.DataFrame({
'County ID': [123],
'County': ['County 1'],
'State ID': [12],
'State': ['State 1'],
'geometry': ['POLYGON (0.0, …)']
})
df3 = pd.DataFrame({
'State ID': [12],
'State': ['State 1'],
'geometry': ['POLYGON (0.0, …)']
})
# Step 1: Merge DataFrames
df_merged = df1.merge(df2, on=['County ID', 'State ID', 'State'], how='left').merge(df3, on=['State ID', 'State'], how='left')
# Step 2: Build nested dictionary representing the tree structure
tree_dict = {}
for _, row in df_merged.iterrows():
state_id = row['State ID']
state_name = row['State']
county_id = row['County ID']
county_name = row['County']
city_id = row['City ID']
city_name = row['City']
polygon = row['geometry']
# Add state to the tree dictionary if it doesn't exist
if state_id not in tree_dict:
tree_dict[state_id] = {
'ID': state_id,
'Name': state_name,
'Polygon': polygon,
'Counties': {}
}
# Add county to the tree dictionary if it doesn't exist
if county_id not in tree_dict[state_id]['Counties']:
tree_dict[state_id]['Counties'][county_id] = {
'ID': county_id,
'Name': county_name,
'Polygon': polygon,
'Cities': {}
}
# Add city to the tree dictionary
tree_dict[state_id]['Counties'][county_id]['Cities'][city_id] = {
'ID': city_id,
'Name': city_name,
'Polygon': polygon
}
# Step 3: Convert nested dictionary to JSON format
json_data = json.dumps(tree_dict, indent=4)
# Print the JSON data
print(json_data)
This code merges the given DataFrames based on the shared ID columns (City ID, County ID, State ID) to create a unified DataFrame. It then builds a nested dictionary structure representing the tree hierarchy, starting from the state level down to the city level. Finally, it converts the nested dictionary into JSON format using the json.dumps() function.
The resulting JSON structure follows the desired tree-like format, where states have nested counties, and counties have nested cities. Each level of the tree contains the relevant ID, Name, and Polygon information.