Search code examples
mysqlhierarchyhierarchical-datanested-setsadjacency-list

How many "entities" are beneath this hierarchical structure. Nested Set/Adjacency (How many POI are inside of England)


Development language and DB: PHP/MySQL

I have a table geo_places with something like 8 million geographical locations.

These places are all hierarchical, and I use

  • parent_id (adjacency),
  • lft/rgt (nested set)
  • and ancestry (enumerated).

Now, I have "points of interest" table called entities which are assigned to a geographical location, and I record against each entity the:

  • lft value of the location in the geo_places
  • and the actual ID of the geographical location.

Now I need a way to provide a directory listing with count EFIFICNETLY (but I will be caching this anyway), of all the places which are beneath a location.

For example, if I take Europe, then I should see all places which have a parent_id of Europe, and then also the amount of entities below it. Keeping in mind that a place does not get assigned directly to Europe, but might be assigned to a small village in Italy (which is a child of Europe).

You know that it is a child of Europe either because:

  • the lft value of the small village in Italy is between the lft and rgt value of the location
  • Or because the ancestry maps to the place.

For example, Europe would have an ancestry path of /1 and an ID of 5. (The 1 would signify the "World"). And then the small village in Italty would have /1/5/234/28924/124128

where 1 = World 5 = Europe 234 = Italy 28924 = Bergamo etc etc...

Anyway, this is how I have structured the dataset, and I have already been using a mixture of the hierarchical structures in order to make my queries a lot more efficient (for those of you wondering why am I am supporting nested set, adjacency and enumerated.. it's because I get the best of all structures this way).

This is an example of what I am trying to do. Example

I am flexible and can change how I manage locations if neccessary. However, this is also a multi tenant application, so I would like to try and avoid saving counts against the geo_places if it can be avoided.

So simply put: Pick a location... then show all locations which have points of interest assigned either to that location, or a child of that location.

Any recommendations?


Solution

  • This is one solution:

    select p.woeid, p.name, e.id, e.woeid, e.lft, count(e.lft) from
    geo_places as p
     join  entities as e on e.lft >= p.lft and e.lft < p.rgt
    where p.parent_woeid = 1
    group by p.woeid
    

    You would subtitute 1 for the place that you want to find the descendant of.

    Tested with 100k entities in entities and 8million rows in geo_places. Index on lft and rgt and woeid.