I am using OSMnx to query the Overpass API. I've noticed that it has a fairly large default for minimum area size:
OVERPASS_MAX_QUERY_AREA_SIZE = 50*1000*50*1000
This value is used to subdivide "larger" polygons into chunks to submit to the Overpass API.
I'd like to understand why the area is so large. For example, the entirety of San Francisco (~50 sq miles) is "simplified" to a single query.
Key questions:
Is there any advantage to reducing query sizes submitted to the Overpass API?*
Is there any advantage to reducing the complexity of shapes/polygons being submitted to the Overpass API (that is, using rectangles with just 4 corner coordinates), versus more complex polygons?**
*Note: Example query that I would be running (looking for the ways that would constitute a walk network):
[out:json][timeout:180];(way["highway"]["area"!~"yes"]["highway"!~"cycleway|motor|proposed|construction|abandoned|platform|raceway"]["foot"!~"no"]["service"!~"private"]["access"!~"private"](37.778007,-122.445467,37.783454,-122.438958);>;);out;
**Note: This question is partially answered in this other post. That said, that question does not focus completely on the performance implications, and is not asked in the context of the variable area threshold used in OSMnx to subdivide "larger" geometries.
max_query_area_size appears to be some heuristic value someone came up after doing a number of test runs. From Overpass API side this figure has pretty much no meaning on its own.
It may be completely off for different kinds of queries or even in a different area than SF. As an example: for infrequent tags, it's usually better to go ahead with a rather large bounding box, rather than firing off a huge number of queries with tiny bounding boxes.
For some statement types, a large bounding box may cause significant longer processing time, though. In this case splitting up the area in smaller pieces may help. Some queries might even consume too much memory, which forces you to split your bounding box in smaller pieces.
As you didn't mention the kind of query you want to run, it's very difficult to provide some general advise. It's like asking for a best way to write SQL statements without providing any additional context.
Using bounding boxes instead of (poly:...) has performance advantages. If you can specify a bounding box, use the respective bounding box filter rather than providing 4 lat/lon pairs to the poly filter.