Search code examples
mysqldatabase-design

Can you define a foreign key in a table from which the table with the primary key is derived?


Let's say I have a table called "Students".

This table has the columns

  • StudentId(primary key),
  • StudentName,
  • StudentSurName.

Let's immagine that from this table I create another table "OriginSurnames" that contains the country of origin of the distinct surnames of the students in the table "Students".

This table has the columns:

  • StudentSurName(primary key)
  • Origin

Does it make sense to declare StudentSurName in the table Students a foreign key condering that I cannot have an entry of "StudentSurName" in the table OriginSurnames if it doesn't exist in the table "Students" .


Solution

  • Problem

    The technical answer is easy, but the problem is a matter of understanding on your part.

    Understanding the data (the purpose of modelling), and designing a database (the last step of modelling) is one science, with a complete set of methods and procedures, that have not changed since 1970. It is an application of Logic, yes, but specifically for data.

    Writing code, program design, good practice; etc, is quite a different science. Although it is a different application of Logic. If a person is good at one science, they are usually poor at the other. It is very rare for a person to be good at both ... something which modern coders do anyway, without understanding the relevance of this. And it always leads to problems.


    The Data

    Does it make sense to declare StudentSurName in the table Students a foreign key condering that I cannot have an entry of "StudentSurName" in the table OriginSurnames if it doesn't exist in the table "Students" .

    You are getting confused because you are not separating the data (and the Normalisation and design issues that must be considered) vs the program (and what it should do, and when).

    Let's take the data, as data, and nothing but data.

    • The simple fact is, the OriginSurname contains one occurrence of each Surname, and Student contains many occurrences of Surname. Therefore the relation is one OriginSurname.Surname to many Student.Surnames.

    • Now for the Consistency in the database. You want the OriginSurname.Surname to exist only if there is one or more Student.Surnames. That is Cardinality. The relation is therefore one OriginSurname.Surname to one-to-many Student.Surnames (as opposed to 0-to-many).

    Do not worry about how to obtain that at this point, because that has to do with the program, not the database, and the database is an independent, self-described and integrated storage for the data.


    Solution • Initial

    It looks like this:

    alvyTA1

    Note • Notation

    • All my data models are rendered in IDEF1X, the Standard for modelling Relational databases since 1993

    • My IDEF1X Introduction is essential reading for beginners.

    Note • Content

    You have one good database table OriginSurname, and one horrible Student file. Why ? Because while you have a Record ID declared as Primary Key, which ensures the Record ID is unique, you have nothing to keep the rows of data unique. The Relational Model demands that the logical rows (as opposed to records) in a table are unique.

    This should fail, we want it to fail, but it will succeed:

    INSERT Student VALUES( "John", "Smith" ) -- should succeed
    INSERT Student VALUES( "John", "Smith" ) -- duplicate row: should fail
    INSERT Student VALUES( "John", "Smith" ) -- duplicate row: should fail
    

    Solution • Corrected

    For the columns that you have given, you need an index on ( Surname, Name ). Now the rows (regardless of what the Record ID is) are unique. A second John Smith will now fail. It looks like this:

    alvyTA2

    Finally ...

    Does it make sense to declare StudentSurName in the table Students a foreign key [referencing the PK in the OriginSurname table]

    Yes. Absolutely. That is how Referential Integrity is maintained in the database, by the database platform. Conversely, if you do not declare that Foreign Key, there will not be an integrity between OriginSurname.Surname and Student.Surname.


    The Program

    Now we will take up the considerations for the program that is required to maintain the database, and the particular Consistency that we have defined in the database. That is, the Consistency is first defined in the database, and second, maintained by the relevant program components that interact with the database tables.

    surname cannot [must not] exist in the table OriginSurnames if there's no one that posseses it in the table "Students"

    Agreed.

    because the surname is inserted in the table OriginSurnames table after a student with that surname has been inserted in the table Students.

    That is wrong thinking. In SQL, since 1981, we have had ACID Transactions. We do not write anything to the database via direct INSERT/UPDATE/DELETE, because that would leave the database Inconsistent, especially if the program or the SQL platform, or the computer system, fails. All writes are properly packaged, and written together, as a Transaction.

    1. First understand that the database must be Consistent, before; during; and after the writes to the database. That is the C in ACID.

      • In our definition, which is correct, OriginSurname is the parent row, and we cannot have a child Student without a parent.

      • Therefore when a new Surname is added to Student, it must be INSERTed into the parent OriginSurname before the INSERT into Student.

      • Likewise, when a Student is DELETEd, if it is the last occurrence of that Surname, the DELETE Student must be followed with a DELETE OriginSurname.

    2. The A stands for Atomic. The idea is, a Transaction consists of more than one write to the database, as does ours, and the Transaction is applied completely, or not at all. Hence Atomic. The Transaction is bracketed with BEGIN TRAN and COMMIT TRAN (or ROLLBACK TRAN if the code is required to exit a partially applied Transaction).

    The Student_Add Transaction will be as follows. I am providing the skeleton code, you need to add error checking, etc:

    BEGIN TRAN
    IF NOT EXISTS ( SELECT 1  
            FROM OriginSurname     -- test if parent not exists
            WHERE Surname = @Surname
            )
        INSERT OriginSurname       -- insert parent first
            VALUES( @Surname, @Origin )
    INSERT Student                 -- insert child second
        VALUES( @Name, @Surname )
    COMMIT TRAN
    

    The Student_Drop Transaction will be as follows:

    BEGIN TRAN
    DELETE
        FROM  Student              -- delete child first
        WHERE Name = @Name
        AND   Surname = @Surname
    IF NOT EXISTS ( SELECT 1
            FROM Student           -- test if parent obsolete
            WHERE Surname = @Surname
            )
        DELETE
            FROM  OriginSurname    -- delete obsolete parent second
            WHERE Surname = @Surname
    COMMIT TRAN
    

    The "before" or "after" in your comments simply does not apply, because we are executing an Atomic Transaction, within which we arrange the SQL verbs to match the definition of the database, and we maintain database Consistency throughout.

    ACID Transaction

    To be complete on ACID Transactions:

    • The I stands for Isolation.
      During the Transaction, ie. between the BEGIN TRAN and the COMMIT TRAN, the changes to the database that are being written, are not complete. We do not want other active database users to see this incomplete data, precisely because the Transaction is Atomic, they must continue to see only Consistent data. Therefore the Transaction will be executed by the platform in Isolation from the other users. Nothing for you to do, except to understand that that is how it happens. It is during this tiny period that resources are locked, and concurrency is an issue (to the extent that it has not been mitigated by OLTP Standards).

    • The D stands for Durable.
      Every Transaction must be durable, ie. once the COMMIT has been executed, the Transaction must persist in the database, it must not be lost due to a system crash, or a bomb destroying the building. That is a facility that the SQL platform provides, in and of itself, nothing for you to do as a coder. Each platform provides various recovery facilities, and the DBA has to set them up, to the degree that the business requires recovery from a hard crash.

    Trigger

    Do not use triggers. They cause many more problems than they allege to fix. More importantly, if you are using Transactions as advised (and no direct writes to the database), there is no need for them.


    Freeware "SQL"

    In the first three decades following the advent of the Relational Model, and SQL its data sub-language, all SQL platforms were real platforms (Server Architecture; full compliance with the SQL Standard; variety of recovery and Durability facilities; etc). ACID Transactions have been in the SQL Standard since its first release in 1981.

    In the 00's, various pretend-SQL suites became available as freeware; shareware; vapourware; noware; etc. These are mickey mouse suites (we cannot call them platforms), with no Server Architecture; no decent backup or recovery; and no compliance with the SQL Standard. Which means their use of the term SQL is incorrect. At best they are pretend-SQLs.

    PostgreSQL has no ACID Transactions. As a sort-of; kinda; looks-like feature, the Functions are declared to be "transactional". Therefore the Standard SQL BEGIN/COMMIT/ROLLBACK TRAN verbs are missing, and you have to place that code in a Function. And hope there are not too many online users.

    MySQL does have transactions, but they are not SQL compliant, ie. they are not true ACID. It should be adequate for the simple kind of work you are doing. The MySQL verb is BEGIN WORK. And you have to set the table to TYPE = InnoDB.