thank you for your attention. It's been a while since I learned normalization in school and university and if you go work in the industries, you will find that often these theoretic things might interfere with what you boss tells you to do. So here is an example of something I'm thinking about for a while and I would love to hear your oppinion. I'm not even sure if there is a right or wrong on this approach so every answer is welcome.
Scenario: We are developing a person management system (PMS). For this, we have a Person table, to store generic information about each person (e. g. Name), a Country table, to store the native country of a person and a Race table for the persons race.
Therefore you could have two foreign key columns on Person, linking to Country and Race, right?
However, my boss want's me to use a fourth table. Lets call it PersonType. The PersonType consists of a primary key as well as the foreign keys colums to Country and Race. The Person table then only has one foreign key column to PersonType.
Therefore I would have a n:1 relation between Person and PersonType as well as n:1 relations between PersonType and Country/Race, right?
In my opinion, the PersonType table is not necessary because you could just put the foreign key columns directly on the Person table but my boss argues that the PersonType can be used to constrain which Country/Race combinations are valid. I understand that argument but I'm asking myself if this database is still properly normalized.
(Of course we are not really developing a PMS but I thought it's easy to imagine and I can't talk about what we are really developing due to an NDA).
UPDATE 21/10/2016
Heres how the table structure looks like in an abstract way:
table person_v1(
person_id int primarykey,
name string,
country_id int foreignkey(country),
race_id int foreignkey(race)
)
table person_v2(
person_id int primarykey,
name string,
person_type_id int foreignkey(person_type)
)
table person_type(
person_type_id int primarykey,
country_id int foreignkey(country),
race_id int foreignkey(race)
)
table country(
country_id int primarykey,
name string
)
table race(
race_id int primarykey,
name string
)
Thank you for your answers so far
The number of 1:N or M:N relationships doesn't determine a relation's normal form. The question actually has nothing to do with normalization.
Some tables . . .
Your design
Follow standards when you can. For countries, I'll follow ISO 3166-1.
create table countries (
iso_country_code char(3) primary key,
country_name varchar(75) not null unique
);
insert into countries (iso_country_code, country_name) values
('USA', 'United States of America'),
('GBR', 'United Kingdom of Great Britain and Northern Ireland'),
('MKD', 'Macedonia (the former Yugoslav Republic of)'),
('ZZZ', 'Unknown country'); -- 'ZZZ' is reserved for a user-assigned value.
For race, I'll follow CDC/HL7 race codes. There are other standards. One of them might be more appropriate. See http://www.cdc.gov/nchs/data/dvs/race_ethnicity_codeset.pdf
Most applications allow multiple race codes for each person. I'm ignoring that real-world fact for this question.
create table races (
cdc_unique_id char(6) primary key,
cdc_race_concept varchar(50) not null unique
);
insert into races (cdc_unique_id, cdc_race_concept) values
('2056-0', 'Black'),
('2106-3', 'White'),
('2076-8', 'Native Hawaiian or other Pacific islander'),
('zzzz-z', 'Unknown');
create table persons (
person_id integer primary key,
person_full_name varchar(25) not null,
iso_country_code char(2) not null
default 'ZZZ'
references countries (iso_country_code)
on update cascade
on delete set default,
cdc_unique_id char(6) not null
default 'zzzz-z'
references races (cdc_unique_id)
on update cascade
on delete set default
);
All three of these tables are in at least 5NF.
One potential problem with your design is that it allows arbitrary parings of country and race. Imagine that, instead of country and race, we were talking about city and state in the USA. An arbitrary paring of city and state would allow "San Francisco, AL". But there is no city named "San Francisco" in Alabama.
That's why allowing arbitrary parings might be a bad decision.
Your boss's design
-- The same as the table above.
create table countries (
iso_country_code char(3) primary key,
country_name varchar(75) not null unique
);
insert into countries (iso_country_code, country_name) values
('USA', 'United States of America'),
('GBR', 'United Kingdom of Great Britain and Northern Ireland'),
('MKD', 'Macedonia (the former Yugoslav Republic of)'),
('ZZZ', 'Unknown country'); -- 'ZZZ' is reserved for a user-assigned value.
-- Also the same as the table above.
create table races (
cdc_unique_id char(6) primary key,
cdc_race_concept varchar(50) not null unique
);
insert into races (cdc_unique_id, cdc_race_concept) values
('2056-0', 'Black'),
('2106-3', 'White'),
('2076-8', 'Native Hawaiian or other Pacific islander'),
('zzzz-z', 'Unknown');
-- This table is new.
create table person_types (
iso_country_code char(3) not null
default 'ZZZ'
references countries (iso_country_code)
on update cascade
on delete set default,
cdc_unique_id char(6) not null
default 'zzzz-z'
references races (cdc_unique_id)
on update cascade
on delete set default,
primary key (iso_country_code, cdc_unique_id)
);
insert into person_types values
('USA', '2016-3'),
('USA', '2056-0'),
('GBR', '2016-3'),
('GBR', '2056-0'),
This "person_types" table records a fact that your database design does not. It records the fact that white persons and black persons might be native to both the USA and to Great Britain. If recording that fact is important, you must include "person_types".
Also, it's worth noting that this table is immune to the problems cited in other comments; you cannot repeatedly add rows where both iso_country_code and cdc_unique_id are null (not null
constraint), you cannot duplicate iso_country_code and cdc_unique_id (primary key
constraint), etc.
Conceptually, you decide which facts to store before you normalize. Normalization can't help you with attributes that don't exist in your schema. That's a different database design task.
-- Structurally identical to the table above.
-- Constraints are slightly different.
--
create table persons (
person_id integer primary key,
person_full_name varchar(25) not null,
iso_country_code char(2) not null
default 'ZZZ',
cdc_unique_id char(6) not null
default 'zzzz-z',
constraint person_types_fk foreign key (iso_country_code, cdc_unique_id)
references person_types (iso_country_code, cdc_unique_id)
on update cascade
on delete set default
);
All four of these tables are in at least 5NF.
The difference is not that one set of tables is more normalized than the other.
The difference is that one set of tables records a fact that the other does not.