Search code examples
databaserelational-databasethird-normal-form

3rd normal form with unique values


I have table USER with schema USER(user_id, email, first_name, last_name, ...). Email is a unique value in my database, user_id is a primary key. So, user_id and email are candidate keys. Does it mean that I have a transitive dependency here (user_id -> email -> (first_name, last_name, ...)) and thus the database is not in NF3?


Solution

  • Noting the Relational Database tag.

    The main problem, that confuses everything, is that there is one Relational Model, and then various nonsense that is promoted by others as "relational".

    1. The one and only original Relational Model by Dr E F Codd.
    • This is a rock-solid definition that has not changed since June 1970, and does not need to change. Truth is permanent, it does not change. It is the basis upon which the commercial platform providers can build their platforms, and have done so in the 1980's.
    1. The Date & Darwen & Fagin version, which is the Record Filing Systems of the 1960's, pre-database as well as pre-Relational: the very thing that the RM replaces. Plus a couple of fragments (not whole concepts) from the Relational Model. Promoted and marketed heavily as "relational", which is not correct.
    • This is a collection of fragments, that is forever changing. They are now up to several "relational algebras"; 17 abnormal "normal forms" (they have changed the original Normal Forms to suit their own purposes); etc.

    • The following terms do not exist in the Relational Model, they are terms used to elevate the fragments in their RFS to obtain a little bit of relational appearance.

      • "transitive dependence"
      • "candidate keys"
    • 1960's RFS are characterised by a Record ID (yours is user_id), which is a physical (not logical) pointer. This is expressly prohibited in the RM, which is logical, and breaks away from the pointer-based systems that preceded it.

    • The central difference (not the only difference) between the 1960's RFS and the RM is, whereas the RFS the relates physical record using physical pointers such as Record ID, the RM relates logical rows of data (not records) using natural Keys that occur in the data.

    • The non-commercial platform providers that arose in the 1990's typically make reference to these additions, or supply provisions that are prohibited in the RM, and sometimes provide features to ease the pain caused by them.

    The Relational Model is freely available (in those days, only qualified people attempted database design). However, the terms are dated and thus not understood today. And it is seminal, dense. The proponents count on that, in order to promote their approach as "relational".


    The Relational Model

    I will answer for the Relational Model. In order to avoid going over the same item more than once, I will take the issues in logical sequence.

    1. The RM requires that each row (not record, because it is beyond records) is unique.

    2. In the RM, one or more Unique Keys are selected for each table. Each Key must be "made up from the data". Further, each Key must be non-redundant, the minimal Key. If there is more than one Key, one is chosen as Primary Key, which is migrated to the subordinate rows as a Foreign Key. Once the Primary Key is chosen, any Keys that remain are Alternate Keys.

    While the Keys may be loosely called "candidates" before the election, after the election they are no longer "candidates", they are losers.

    • The use of the term "candidate" serves only to (a) maintain the tension of not choosing a Primary Key from one of the "candidates" (as required by the RM), and (b) thus allow a non-Key (such as a fabricated Record ID or user_id as a PRIMARY KEY.

    • A Record ID field does not exist in the data, it is manufactured by the system (GUID; AUTO INCREMENT; etc). Such a field is great for perceiving the data in physical terms (RFS), as if it were a stupefying grid, and therefore suppressing the perception of data as logical components the are related (the RM).

    • A genuine Key has many important properties. Declaring a non-Key to be a PRIMARY KEY, which is possible in SQL, does not magically give the non-Key any of those properties.

    • Therefore the schema is USER(user_id, email, first_name, last_name, ...)
    1. You have recognised that email is an unique identifier. Great. In fact, for that schema, it is the only Key, so you do not have to concern yourself with choosing one from many possibles.

    Here is a comparison.

    BakTA

    • All my data models are rendered in IDEF1X, the Standard for modelling Relational databases since 1993.

    • My IDEF1X Introduction is essential reading for beginners.


    Now you want to check if the table satisfies 3NF. Dmitry's intention is correct, although his definition might not be. Dr E F Codd's 3NF, not the pretenders:

    • A row is in third normal form iff every non-prime attribute is Functionally Dependent on the Key, the whole Key, and nothing but the Key.

    • The RM has only "full" Functional Dependence, it does not assist those who fragment the Key to deal with their fragments.

    • Transitivity is a logical and mathematical term, it is not subtracted in the RM, rather, it simply does not apply, it is not required, because it deals with the whole Key, whereas RFS deals with fragments of the Key.

    • first_name, last_name, ...
      are each Functionally Dependent on email, and nothing but email.

    • Therefore it satisfies 3NF.


    1960's Record Filing System

    Sorry, I can't help you there, because it is an ever-changing and unreliable mess that deals with fragments of data instead of recognising the atoms, and they will argue endlessly, resolving nothing. They work on notions that are not in the RM, and therefore fraudulently called "relational", while retaining the essential parts of the 1960's Record-oriented paradigm: the additional field and additional index for the physical Record ID; and the reference by physical Record ID. Both of which are prohibited by the RM.


    Comments

    This section is presented in accordance with the SO guidelines, specifically: to correct misinformation whenever you see it. I did respond to the comments, but they keep disappearing. Thus I have placed it here.

    philipxy:
    Codd's 1971 [paper] "Further normalization of the data base relational model" introduced "normalization" in the sense of decomposition to higher NFs, including "FD", "CK", "one of its CKs is arbitrarily designated as PK", "2NF" in terms of "partial/full FDs" & "3NF" in terms of "transitive/non-transitive FDs".

    1. That quote is from the 1971 paper, not the Jun 1970 paper The Relational Model. They are two different papers. Therefore it is confirmed:
    • that that content is in the 1971 paper

    • the Jun 1970 Relational Model paper does not have that content.

    as such, the 1971 paper can be dismissed.

    1. As evidenced, Codd wrote about twelve additional papers during the decade (1970 to 1980) that he was trying to get the RM accepted. They have no value except for historical purposes, to examine the way he responded to the upheaval of the DBMS platform suppliers, that was caused by the RM: it was a paradigm shift.

    The 1971 paper, and the "RM/Tasmania" articles and presentations, have the explicit purpose of assisting the then entrenched DBMS platform users to implement a bit of relational capability into their systems without changing the platform, the reference-by-physical-pointer paradigm (mindset & implementation).

    After the Relational Model became accepted, around 1985, when all the DBMS platform suppliers started switching to supplying RDBMS platforms, ie. the reference-by-physical-pointer platforms became extinct, the 1971 paper, which was previously near and dear to them, became obsolete.

    Therefore, again, the 1971 paper can be dismissed.

    1. The only article (not a formal paper, but widely accepted as one) that is relevant to the RM is the 1985 ComputerWorld article commonly known as Codd's Twelve Rules, which gives the rules for both (a) a DBMS Platform, and (b) a database, to be accepted as genuinely Relational. That was to overcome the problem of DBMS platform suppliers adding Relational bits and pieces and then labelling their product "relational".

    2. Still confused? A scientific person would:

    • recognise that the RM contradicts the 1971 paper, and the 1971 paper contradicts the RM,

    • recognise that they cannot both be true at the same time,

    • apply the Law of Non-Contradiction,

    • and cancel the 1971 paper.

    1. The Date, Darwen, Fagin, et al, and all their followers (authors, professors, lecturers, etc), are not stupid. Therefore the following evidence acts:
    • the non-resolution of the purported issue [4],

    • the suppression of the RM (atomicity; Codd's three NFs unchanged; the Relational Key; "full" Functional Dependence on the Relational Key),

    • the elevation of the obsolete 1971 paper (fragmentation; ever-changing re-definitions of 17 NFs; physical RecordID as "key"; partial FDs to deal with the fragments; etc),

    • the declaration that the 1971 paper, that was obsolete as of 1985, that contradicts the Relational Model, is "the relational model". As evidenced by philipxy and others.

    are incorrect, and are in disagreement with the original relational model.

    Institutionalised Suppression of the Relational Model

    It is reinforced by additional acts:

    • misrepresenting Codd's Rule 1 (the after the fact View of data from the database) as the definitive, before the fact perception of data, and that that is the "table":

    Erwin Snout:
    When cut down to its bare essence, the relational model of data has no more than exactly one "rule" : all information in the database must be represented as values of attributes in tuples in relations.

    There are over 40 rules in the Relational Model and the Twelve Rules. Reducing them to one pithy rule is not compatible with the RM.

    • which eliminates the possibility of genuine data modelling

    • perpetuating the use of the totally obsolete ERD for data analysis & modelling, which does not support the central article of the Relational Model, the composite Relational Key, and thus maintains the perception of fragments

    • suppressing the use of IDEF1X, which is designed for the Relational Model, available since 1985, the NIST Standard for modelling Relational data since 1993. It formalises the concepts in the RM.