We are developing a system using MySQL database involve Holiday.
Some holiday may only apply to certain states, Example: Good Friday only apply to Hawaii, Kentucky, Louisiana and etc.
holiday | state
---------------------
Good Friday | Connecticut, Delaware, Hawaii, Indiana, Kentucky, Louisiana, New Jersey, North Carolina, North Dakota, Tennessee
Our team have some discussion on how should state store in MySql database after normalisation.
At first we come out the design like following (some columns like date or holiday id are skipped):
Design A
holiday | state code
------------------------
Good Friday | CT
Good Friday | DE
Good Friday | HI
Good Friday | IN
Good Friday | KY
Good Friday | LA
Good Friday | NJ
Good Friday | NC
Good Friday | ND
Good Friday | TN
But one of our team member propose another design:
Design B
holiday | state
---------------------
Good Friday | CT,DE,HI,IN,KY,LA,NJ,NC,ND,TN
Design B looks like contradict with what I learn about database design in terms of normalisation, but my team member insist this is the correct method, and never see people design table like Design A
May I know which one more proper? Or is there some new method in database design that I didn't know that cause Design B is used in current database design?
Thanks
[Edit]
To make things more specific, the purpose of the holiday table is to calculate due date:
date_begin + N working days = due_date
If N = 3 and date begin is 2020-04-07, (assume Saturday is working day)
2020-04-07 + 3 = 2020-04-10 // Good Friday
If the case register in Connecticut, where 2020-04-10 is holiday, thus the due date will be 2020-04-11, but if the case register in other state, example Washington DC, the due date will be 2020-04-10.
The typical design for this would be something like:
Holidays
---------
ID | Name | ....
--------------------------
1 |Good Friday |...
2 |Easter Monday|...
3 |Christmas Day|...
...
States
----------
ID | Name
---------------------------
1 | Connecticut
2 | Texas
3 | Vermont
....
State_Holiday
----------------
State_ID | Holiday_ID
----------------------
1 | 1
2 | 1
3 | 2
.....
By using a table to define which holidays exist, and referring to them via a foreign key (holiday_id), you avoid typos - in design A, someone might enter "good friday" (not capitalization), and your application would be unclear about whether that's the same as "Good Friday".
Ditto for states - the state abbreviation codes may suffice as primary keys, but I'm not certain they are guaranteed unique (is there another country that uses these codes?)
The joining table State_Holiday makes it easy to find which holidays are observed in a given state, or which states observe a given holiday.
[Edit] Now you've outlined your sample use case, imagine executing that using design B.
You'd have to do a whole bunch of string parsing to find out whether a Connecticut has a holiday on a day that falls between date_begin and date_begin + N. You have to handle the delimiter, and the fact you cannot guarantee at a data model level that state codes are 2 characters. The comparison operator would almost certainly be something along the lines of and state like '*CT*'
, which would perform terribly because indexes don't work. You'd need application-level logic to make sure the state codes are correct - you can't enforce referential integrity in the data model.
Option A is much simpler - you're just comparing a bunch of foreign keys, rather doing string comparison. Your referential integrity is enforced by the data model, rather than application logic.
Option A is preferred by people used to working with relational databases; it's normalized, uses referential integrity and should perform well, no matter how many holidays you have.
Option B is a "NoSQL" approach. That's a whole different conversation, but I would not force a NoSQL design into a MySQL system.