Search code examples
mysqldatabase-designreferential-integritydata-integrity

How can I best maintain integrity between two columns in a table?


Hypothetically, I have an ENUM column named Category, and an ENUM column named Subcategory. I will sometimes want to SELECT on Category alone, which is why they are split out.

CREATE TABLE `Bonza` (
   `EventId`     INT UNSIGNED NOT NULL AUTO_INCREMENT,
   `Category`    ENUM("a", "b", "c") NOT NULL,
   `Subcategory` ENUM("x", "y", "z") NOT NULL,

   PRIMARY KEY(`EventId`)
) ENGINE=InnoDB;

But not all subcategories are valid for all categories (say, "z" is only valid with "a" and "b"), and it irks me that this constraint isn't baked into the design of the table. If MySQL had some sort of "pair" type (where a column of that type were indexable on a leading subsequence of the value) then this wouldn't be such an issue.

I'm stuck with writing long conditionals in a trigger if I want to maintain integrity between category and subcategory. Or am I better off just leaving it? What would you do?

I suppose the most relationally-oriented approach would be storing an EventCategoryId instead, and mapping it to a table containing all valid event type pairs, and joining on that table every time I want to look up the meaning of an event category.

CREATE TABLE `Bonza` (
   `EventId`         INT UNSIGNED NOT NULL AUTO_INCREMENT,
   `EventCategoryId` INT UNSIGNED NOT NULL,

   PRIMARY KEY(`EventId`),
   FOREIGN KEY `EventCategoryId` REFEFRENCES(`EventCategories`.`EventCategoryId`)
     ON DELETE RESTRICT ON UPDATE CASCADE
) ENGINE=InnoDB;

CREATE TABLE `EventCategories` (
   `EventCategoryId` INT UNSIGNED NOT NULL,
   `Category`    ENUM("a", "b", "c") NOT NULL,
   `Subcategory` ENUM("x", "y", "z") NOT NULL,

   PRIMARY KEY(`EventCategoryId`)
) ENGINE=InnoDB;
-- Now populate this table with valid category/subcategory pairs at installation

Can I do anything simpler? This lookup will potentially cost me complexity and performance in calling code, for INSERTs into Bonza, no?


Solution

  • Assuming that your categories and subcategories don't change that often, and assuming that you're willing to live with a big update when they do, you can do the following:

    Use an EventCategories table to control the hierarchical constraint between categories and subcategories. The primary key for that table should be a compound key containing both Category and Subcategory. Reference this table in your Bonza table. The foreign key in Bonza happens to contain both of the columns that you want to filter by, so you don't need to join to get what you're after. It will also be impossible to assign an invalid combination.

    CREATE TABLE `Bonza` (
       `EventId`         UNSIGNED INT NOT NULL AUTO_INCREMENT,
       `Category`        CHAR(1) NOT NULL,
       `Subcategory`     CHAR(1) NOT NULL,
    
       PRIMARY KEY(`EventId`),
       FOREIGN KEY `Category`, `Subcategory` 
       REFEFRENCES(`EventCategories`.`Category`, `EventCategories`.`Subcategory`)
         ON DELETE RESTRICT ON UPDATE CASCADE
    ) ENGINE=InnoDB;
    
    CREATE TABLE `EventCategories` (
       `EventCategoryId` UNSIGNED INT NOT NULL,
       `Category`    CHAR(1) NOT NULL,
       `Subcategory` CHAR(1) NOT NULL,
    
       PRIMARY KEY(`Category`, `Subcategory`)
    ) ENGINE=InnoDB;