Search code examples
sqldatabase-designforeign-keysrelational-databasenormalization

How best to normalize and reference (FK) locations (Neighborhood/City/Region/Country/Continent)


So I have searched around but haven't found a satisfactory answer.

I have different types of locations, as stated in the title. Given a type of location (i.e. city), the less granular locations can be inferred. I.e. if you know you're in Oregon, it implies you're in the United States, which implies you're in North America.

We have Objects that reference locations, but the granularity is not all the same. Some items might point to neighborhoods, others are only known down to the city level, while some are only known to a region, etc.

There were two ways in which I thought of organizing the data, this is the way I am leaning towards:

  • Have a generic "Locations" table, with a location "type" and a "parent location" referencing itself. So there'd be an entry for United States of type country, and an entry for Oregon type state which references United States. i.e.

enter image description here

You can then have the object reference the location off its primary key, and then other locations can be inferred. Does this make sense or is there a better way I could be organizing the data?

  • The other way I considered was with a different table for each location "type" but then the problem is having our objects referencing it, since the most granular type of location for an object isn't always the same.

If I were to slip other location types in later, for example counties in between Cities and Regions, might this present a problem? I'm thinking it would be no more a problem than with separate tables, but perhaps there's a better way I can keep track of things in a logical way.


Solution

  • This is a case of subclasses, often called subtypes. It's complicated by the fact that some subtypes are contained in other subtypes. The container issue is well handled by classical elementary relational database design.

    The subclass issue requires a little explanation. What OOP calls "subclasses" goes by the name "ER Specialization" in ER modeling circles. This tells you how to diagram subclasses, but it doesn't tell you how to implement them.

    It's worth mentioning two techniques for implementing subclasses in SQL tables. The first goes by the name "single Table Inheritance". The second goes by the name "Class Table Inheritance". In class table inheritance, you will have one generic table for "locations" with all the attributes that are common to all locations, regardless of type. In the "Cities" table you will have attributes that pertain to cities, but not to countries, etc. You will have other subclass tables for the other types of locations.

    If you go this route, you should look up another technique, called "Shared Priomary Key". In this technique, the id field of the subclass tables all contain copies of the id field from the superclass table. This requires a little effort, but it's well worth it.

    Shared primary key offers several advantages. It enforces the one-to-one nature of a subclass relationship. It makes joining specialized data with generalized data simple, easy, and fast. It keeps track of which items belong in which subclass, without an extra field.

    In your case, there is yet another advantage. Other tables that reference a location by using a foreign key don't have to decide whether to reference the superclass table or the subclass table. A single foreign key that references the superclass table will also implicitly reference one of the subclass tables, although it isn't obvious which one.

    This isn't perfect, but it's very, very good. Been there, done that.

    For more information, you can google the techniques, or find relevant tags here in SO.