I have downloaded the NYC Taxi Zones dataset (downloaded from SODA Api and saved as json file - Not GeoJson or Shapefile). The dataset is rather small, thus I am using the whole information included. For the convenience of the post I am presenting the first 2 rows of the dataset:
the_geom
).write_ndjson()
commandBelow the dataset after applying unnest()
and selecting some columns and the first 2 rows.
You may import the data using polars with the following command
import polars as pl
poc = pl.read_json("./data.json"))
I am interested in the multipolygons. I am actually trying to re-calculate the shape_area
by using the multipolygon and wkt (Well-Known-Text representation) - method used by shapely
module.
What I have done so far is to use the column coordinates
and transform it to a MultiPolygon() object - readable by the Shapely module.
def flatten_list_of_lists(data):
return [subitem3 for subitem1 in data for subitem2 in subitem1 for subitem3 in subitem2]
The function takes as input a list[list[list[list[f64]]]]
object and transforms to a list[list[f64]]
object.
flattened_lists = [flatten_list_of_lists(row) for row in poc["coordinates"].to_list()]
print(flattened_lists)
[[[-74.18445299999996, 40.694995999999904], [-74.18448899999999, 40.69509499999987], [-74.18449799999996, 40.69518499999987], [-74.18438099999997, 40.69587799999989], [-74.18428199999994, 40.6962109999999], [-74.18402099999997, 40.69707499999989]...
Then I use the function below that applies string concatenation and basically:
list[list[f64]]
object to String.def polygon_to_wkt(polygon):
# Convert each coordinate pair to a string formatted as "lon lat"
coordinates_str = ", ".join(f"{lon} {lat}" for lon, lat in polygon)
# Format into a WKT Multipolygon string (each polygon as a single polygon multipolygon)
return f"""MULTIPOLYGON ((({coordinates_str})))"""
formatted_wkt = [polygon_to_wkt(polygon) for polygon in flattened_lists]
poc = poc.with_columns(pl.Series("WKT", formatted_wkt))
Finally, I use the method wkt.loads("MultiPolygon ((()))").area
to compute the shape area of the Multipolygon object
def convert_to_shapely_area(wkt_str):
try:
return wkt.loads(wkt_str).area
except Exception as e:
print("Error converting to WKT:", e)
return None
poc = poc.with_columns(
pl.col("WKT").map_elements(convert_to_shapely_area).alias("shapely_geometry")
)
print(poc.head())
Even though for the first shape the WKT correctly returns the area of the object, while for the second MultiPolygon the methods returns the following error:
IllegalArgumentException: Points of LinearRing do not form a closed linestring
What I have noticed between the two rows, is that the multipolygon of Newark Airport is a continues object of list[list[f64]]] coordinates. Whereas, the Jamaika Bay has multiple sublists [list[list[f64]]] elements (check column coordinates to verify this). Also the screenshot below verify this statement.
Thus, is there any way to unify the multipolygons of Jamaica Bay into one single GEOmetric object before applying WKT?
P.S: Many solutions on GitHub use the shape file, but I would like to regularly re-download the NYC zones automatically with the SODA API.
To download the raw .json file from SODA API (omit logger_object
lets pretend it's print()
)
import requests
params = {
"$limit": geospatial_batch_size, #i.e. 50_000
"$$app_token": config.get("api-settings", "app_token")
}
response = requests.get(api_url, params=params)
if response.status_code == 200:
data = response.json()
if not data:
logger_object.info("No data found, please check the API connection.")
sys.exit()
with open("{0}/nyc_zone_districts_data.json".format(geospatial_data_storage), "w") as f:
json.dump(data, f, indent=4)
else:
logger_object.error("API request failed.")
logger_object.error("Error: {0}".format(response.status_code))
logger_object.error(response.text)
I have eventually found a solution for this problem. As already described I am flattening the list of coordinates (a.k.a Points). A point in geospatial data is a (x,y) coordinate. Thus, a MultiPolygon is a combination of multiple Points.
def flatten_list_of_lists(data) -> list:
return [subitem2 for subitem1 in data for subitem2 in subitem1]
, this abovementioned function is important because the object data
used as input argument is of type [list[list[list[f64]]]]
and a MultiPolygon
has a specific cardinality level per point.
Then I transform its flattened list to a MultiPolygon object using shapely
from shapely.geometry import Polygon, MultiPolygon, Point
def transform_polygons_to_multipolygons(flat_list:list) -> list:
return [ MultiPolygon( [Polygon(coord) for coord in polygon]).wkt for polygon in flat_list]
You will notice that after creating each MultiPolygon
object I save it to WKT
format (thus as a string).
Finally, I compute the area of the multipolygon
from shapely import wkt
def compute_geo_areas(multipolygons:MultiPolygon) -> float:
return wkt.loads(multipolygons).area
The final code
flattened_lists:list = [flatten_list_of_lists(row) for row in df_geo["coordinates"].to_list()]
multipolygons:list = transform_polygons_to_multipolygons(flattened_lists)
df_geo = df_geo.with_columns(pl.Series("multipolygons", multipolygons)) \
.with_columns(polygon_area=pl.col("multipolygons").map_elements(compute_geo_areas)) \
.drop("coordinates")
, where df_geo
is the Polars DataFrame loaded from the JSON file attached on the SO question.