Search code examples
database-designrelational

Tricky relational DB design question


I have a tricky problem that I've been messing about with for a few days now and cant find an optimal solution for.

These are my tables:

  • site
  • site_node
  • page

The site node table contains a list of nodes representing a hierarchy (using nested set). Each node must have one or more associated pages. Each site must have one associated error page and one not found page.

So, a page must either belong to a node, or a site as an error or not found page. The solutions I'm currently toying with are:

  1. parentType and parentId fields on the page table, where type would either be "node", "site_error" or "site_notFound" and the id would be the site or node id (whichever is relevant to the type).
  2. nodeId field on the page table that can be null, and then errorPageId and notFoundPageId fields on the site table.

Option #1 ensures that each page belongs to one and only one other entity, although the relationship cant actually be enforced as the parentId field can point to more than one place.

Option #2 is cleaner, but it's basically saying that the site "belongs" to the two error and not found pages, and that's probably bad practice.

Any thoughts or suggestions?
Thanks,
Jack


Solution

  • Option #1 ensures that each page belongs to one and only one other entity, although the relationship cant actually be enforced as the parentId field can point to more than one place.

    Right. In terms of relational theory, the problem is that your "parentId" column violates third normal form, because its meaning varies per row, based on the value in parentType (a non-key column).

    You wouldn't have a properly designed database if a single column could contains someone's phone number or their birthdate, per row, depending on some other flag. Those are two different facts about the person, and they each deserve their own column. Likewise, storing both site_id or node_id in a single column would have the same problem.

    Another clue that this is a flawed design is that you can't declare a foreign key constraint to point to either of two referenced tables.

    Option #2 is cleaner, but it's basically saying that the site "belongs" to the two error and not found pages, and that's probably bad practice.

    I see why you're saying that, because of the belongs to conventions in Rails-like frameworks. But these are conventions; they aren't necessarily the only relationship that foreign keys can model. You can make one entity refer to exactly one other entity, in a has one relationship. In this case, the foreign key reverses direction.

    I would say it's logically true that the Error page and the Not Found page belong to the site, not the other way around. And the way to make them mandatory is to have another entity reference these pages, and apply the NOT NULL constraint to these references. This is what you've described.

    CREATE TABLE site (
      . . .
      error_page_id     INT NOT NULL,
      notfound_page_id  INT NOT NULL,
      FOREIGN KEY (error_page_id)    REFERENCES pages (page_id),
      FOREIGN KEY (notfound_page_id) REFERENCES pages (page_id)
    );
    

    This meets your immediate need, it's enforceable, and it's in Normal Form.


    @NXC suggests making dummy nodes for Error and Not Found pages. Although this allows these nodes to be stored in the node hierarchy, it fails to enforce that a site must have these pages. That is, a site could be stored without references to these nodes.

    @Tony Andrews suggests storing two columns in each page, site_id and site_node_id, and adding a CHECK constraint to ensure that exactly one of these is non-NULL. This seems better than the parent_id/parent_type option, but it still doesn't offer any enforcement that every site must have an Error and a Not Found page.