Search code examples
mysqldatabase-normalization

Normalization of Licence Classes


How do I store data about classes of licences?

Data subset is People who have Licences and Licences have Classes.

Thanks to @karmakaze for this drawing.

+------+    +-------+     +---------+
|person|---*|licence|---1*|lic_class|
+------+    +-------+     +---------+

The people table has person_id and various name fields.

The licences have unique numbers issued by a government agency with an expiry date, licences have multiple classes.

Originally I was going to structure the licenses table as a linking table between lut_training (a.k.a lic_class above) and people

CREATE TABLE `licences` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `person_id` int(10) unsigned NOT NULL,
  `training_id` int(10) unsigned NOT NULL,
  PRIMARY KEY (`id`),
  KEY `person_id` (`person_id`),
  KEY `training_id` (`training_id`),
  CONSTRAINT `licences_ibfk_3` FOREIGN KEY (`person_id`) REFERENCES `people` (`person_id`) ON UPDATE CASCADE,
  CONSTRAINT `licences_ibfk_4` FOREIGN KEY (`training_id`) REFERENCES `lut_training` (`training_id`) ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

The lut_training table was originally just a lookup table to handle short and long names for licence classes e.g. (id="6", short="1F", long="1F Armed Guard").

Then I needed to put the licence_number and expiry_date somewhere, so it can either go in licences (probably best place for it) or lut_training. I settled for lut_training, but this is probably poor DB design.

CREATE TABLE `lut_training` (
  `training_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `training_short` varchar(10) COLLATE utf8mb4_unicode_ci NOT NULL,
  `training_long` varchar(50) COLLATE utf8mb4_unicode_ci NOT NULL,
  `expiry_date` date DEFAULT NULL,
  `licence_number` int(10) unsigned NOT NULL,
  PRIMARY KEY (`training_id`),
  KEY `training_long` (`training_long`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

I then discovered that each licence class does not get its own number or expiry date unlike the 8 other government licences I have. Plus classes can be added and removed at will. So there is no point in putting expiry_date and licence_number fields in the lic_class table.

10 classes currently exist 1A,1B,1C,1D,1E,1F,2A,2B,2C,2D. Someone may start with 1A & 1E, add 1C and then drop the 1A and retain 1E & 1C until their licence expires.

So now it seems stupid and inefficient to create lut_training and put a row in for every licence class when lots of that data is duplicated because most people in the dataset are multi-class licensees.

So I could do this and use 1 field in licence to hold a licence's classes:

+------+    +-------+
|person|---*|licence|
+------+    +-------+

That's a bit like how WordPress.org wp_options table is a mess of key:value pairs.

As SQL

CREATE TABLE `licences` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `person_id` int(10) unsigned NOT NULL,
  `expiry_date` date DEFAULT NULL,
  `licence_number` int(10) unsigned NOT NULL,
  `licence_types` varchar(40) COLLATE utf8mb4_unicode_ci NOT NULL,
  PRIMARY KEY (`id`),
  KEY `person_id` (`person_id`),
  CONSTRAINT `licences_ibfk_3` FOREIGN KEY (`person_id`) REFERENCES `people` (`person_id`) ON UPDATE CASCADE,
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

The bit I'm struggling with is licence_types varchar(40) COLLATE utf8mb4_unicode_ci NOT NULL, or even licence_types smallint NOT NULL,

That seems to violate database normalization specifically 1NF first normal form.

Plus I would have to do string or integer manipulation in order to update licence classes if I use a string or integer field instead of a separate lic_class table. 10 licence classes means 2^10=1024 bits of data or some pseudo- or literal- array field to store all the licence class combinations a person holds. That's not a lot of space.

Directions on how to approach this are appreciated.

Hopefully that is clearer for the detractors.


Solution

  • As I see it, a person can have zero or more licences (from different governments) and each license can have one or more licensed classes.

    +------+    +-------+     +---------+
    |person|---*|licence|---1*|lic_class|
    +------+    +-------+     +---------+
    

    The expiration date will go with the license. You can choose to denormalize the licenced classes into license depending on how much/little data there is per licensed class.

    The tricky bit is the constraints. There should be a unique constraint on license (person_id, government). There should also be a unique constraint on lic_class(licence_id, license_type) as you can't have the same type more than once.