I have table USER
with schema USER(user_id, email, first_name, last_name, ...)
.
Email is a unique value in my database, user_id is a primary key. So, user_id and email are candidate keys.
Does it mean that I have a transitive dependency here (user_id -> email -> (first_name, last_name, ...)
) and thus the database is not in NF3?
Noting the Relational Database
tag.
The main problem, that confuses everything, is that there is one Relational Model, and then various nonsense that is promoted by others as "relational".
This is a collection of fragments, that is forever changing. They are now up to several "relational algebras"; 17 abnormal "normal forms" (they have changed the original Normal Forms to suit their own purposes); etc.
The following terms do not exist in the Relational Model, they are terms used to elevate the fragments in their RFS to obtain a little bit of relational appearance.
1960's RFS are characterised by a Record ID
(yours is user_id
), which is a physical (not logical) pointer. This is expressly prohibited in the RM, which is logical, and breaks away from the pointer-based systems that preceded it.
The central difference (not the only difference) between the 1960's RFS and the RM is, whereas the RFS the relates physical record using physical pointers such as Record ID
, the RM relates logical rows of data (not records) using natural Keys that occur in the data.
The non-commercial platform providers that arose in the 1990's typically make reference to these additions, or supply provisions that are prohibited in the RM, and sometimes provide features to ease the pain caused by them.
The Relational Model is freely available (in those days, only qualified people attempted database design). However, the terms are dated and thus not understood today. And it is seminal, dense. The proponents count on that, in order to promote their approach as "relational".
I will answer for the Relational Model. In order to avoid going over the same item more than once, I will take the issues in logical sequence.
The RM requires that each row (not record, because it is beyond records) is unique.
In the RM, one or more Unique Keys are selected for each table. Each Key must be "made up from the data". Further, each Key must be non-redundant, the minimal Key. If there is more than one Key, one is chosen as Primary Key, which is migrated to the subordinate rows as a Foreign Key. Once the Primary Key is chosen, any Keys that remain are Alternate Keys.
While the Keys may be loosely called "candidates" before the election, after the election they are no longer "candidates", they are losers.
The use of the term "candidate" serves only to (a) maintain the tension of not choosing a Primary Key from one of the "candidates" (as required by the RM), and (b) thus allow a non-Key (such as a fabricated
Record ID
oruser_id
as aPRIMARY KEY
.A
Record ID
field does not exist in the data, it is manufactured by the system (GUID; AUTO INCREMENT
; etc). Such a field is great for perceiving the data in physical terms (RFS), as if it were a stupefying grid, and therefore suppressing the perception of data as logical components the are related (the RM).A genuine Key has many important properties. Declaring a non-Key to be a
PRIMARY KEY
, which is possible in SQL, does not magically give the non-Key any of those properties.
USER(
user_id,
email, first_name, last_name, ...)
email
is an unique identifier. Great. In fact, for that schema, it is the only Key, so you do not have to concern yourself with choosing one from many possibles.Here is a comparison.
All my data models are rendered in IDEF1X, the Standard for modelling Relational databases since 1993.
My IDEF1X Introduction is essential reading for beginners.
Now you want to check if the table satisfies 3NF. Dmitry's intention is correct, although his definition might not be. Dr E F Codd's 3NF, not the pretenders:
A row is in third normal form iff every non-prime attribute is Functionally Dependent on the Key, the whole Key, and nothing but the Key.
The RM has only "full" Functional Dependence, it does not assist those who fragment the Key to deal with their fragments.
Transitivity is a logical and mathematical term, it is not subtracted in the RM, rather, it simply does not apply, it is not required, because it deals with the whole Key, whereas RFS deals with fragments of the Key.
first_name, last_name, ...
are each Functionally Dependent on email
, and nothing but email
.
Therefore it satisfies 3NF.
Sorry, I can't help you there, because it is an ever-changing and unreliable mess that deals with fragments of data instead of recognising the atoms, and they will argue endlessly, resolving nothing. They work on notions that are not in the RM, and therefore fraudulently called "relational", while retaining the essential parts of the 1960's Record-oriented paradigm: the additional field and additional index for the physical Record ID
; and the reference by physical Record ID
. Both of which are prohibited by the RM.
This section is presented in accordance with the SO guidelines, specifically: to correct misinformation whenever you see it. I did respond to the comments, but they keep disappearing. Thus I have placed it here.
philipxy:
Codd's 1971 [paper] "Further normalization of the data base relational model" introduced "normalization" in the sense of decomposition to higher NFs, including "FD", "CK", "one of its CKs is arbitrarily designated as PK", "2NF" in terms of "partial/full FDs" & "3NF" in terms of "transitive/non-transitive FDs".
that that content is in the 1971 paper
the Jun 1970 Relational Model paper does not have that content.
as such, the 1971 paper can be dismissed.
The 1971 paper, and the "RM/Tasmania" articles and presentations, have the explicit purpose of assisting the then entrenched DBMS platform users to implement a bit of relational capability into their systems without changing the platform, the reference-by-physical-pointer paradigm (mindset & implementation).
After the Relational Model became accepted, around 1985, when all the DBMS platform suppliers started switching to supplying RDBMS platforms, ie. the reference-by-physical-pointer platforms became extinct, the 1971 paper, which was previously near and dear to them, became obsolete.
Therefore, again, the 1971 paper can be dismissed.
The only article (not a formal paper, but widely accepted as one) that is relevant to the RM is the 1985 ComputerWorld article commonly known as Codd's Twelve Rules, which gives the rules for both (a) a DBMS Platform, and (b) a database, to be accepted as genuinely Relational. That was to overcome the problem of DBMS platform suppliers adding Relational bits and pieces and then labelling their product "relational".
Still confused? A scientific person would:
recognise that the RM contradicts the 1971 paper, and the 1971 paper contradicts the RM,
recognise that they cannot both be true at the same time,
apply the Law of Non-Contradiction,
and cancel the 1971 paper.
the non-resolution of the purported issue [4],
the suppression of the RM (atomicity; Codd's three NFs unchanged; the Relational Key; "full" Functional Dependence on the Relational Key),
the elevation of the obsolete 1971 paper (fragmentation; ever-changing re-definitions of 17 NFs; physical RecordID
as "key"; partial FDs to deal with the fragments; etc),
the declaration that the 1971 paper, that was obsolete as of 1985, that contradicts the Relational Model, is "the relational model". As evidenced by philipxy and others.
are incorrect, and are in disagreement with the original relational model.
It is reinforced by additional acts:
Erwin Snout:
When cut down to its bare essence, the relational model of data has no more than exactly one "rule" : all information in the database must be represented as values of attributes in tuples in relations.
There are over 40 rules in the Relational Model and the Twelve Rules. Reducing them to one pithy rule is not compatible with the RM.
which eliminates the possibility of genuine data modelling
perpetuating the use of the totally obsolete ERD for data analysis & modelling, which does not support the central article of the Relational Model, the composite Relational Key, and thus maintains the perception of fragments
suppressing the use of IDEF1X, which is designed for the Relational Model, available since 1985, the NIST Standard for modelling Relational data since 1993. It formalises the concepts in the RM.