Let's say I have a table called "Students".
This table has the columns
Let's immagine that from this table I create another table "OriginSurnames" that contains the country of origin of the distinct surnames of the students in the table "Students".
This table has the columns:
Does it make sense to declare StudentSurName in the table Students a foreign key condering that I cannot have an entry of "StudentSurName" in the table OriginSurnames if it doesn't exist in the table "Students" .
The technical answer is easy, but the problem is a matter of understanding on your part.
Understanding the data (the purpose of modelling), and designing a database (the last step of modelling) is one science, with a complete set of methods and procedures, that have not changed since 1970. It is an application of Logic, yes, but specifically for data.
Writing code, program design, good practice; etc, is quite a different science. Although it is a different application of Logic. If a person is good at one science, they are usually poor at the other. It is very rare for a person to be good at both ... something which modern coders do anyway, without understanding the relevance of this. And it always leads to problems.
Does it make sense to declare StudentSurName in the table Students a foreign key condering that I cannot have an entry of "StudentSurName" in the table OriginSurnames if it doesn't exist in the table "Students" .
You are getting confused because you are not separating the data (and the Normalisation and design issues that must be considered) vs the program (and what it should do, and when).
Let's take the data, as data, and nothing but data.
The simple fact is, the OriginSurname
contains one occurrence of each Surname
, and Student
contains many occurrences of Surname
. Therefore the relation is one OriginSurname.Surname
to many Student.Surnames
.
Now for the Consistency in the database. You want the OriginSurname.Surname
to exist only if there is one or more Student.Surnames
. That is Cardinality. The relation is therefore one OriginSurname.Surname
to one-to-many Student.Surnames
(as opposed to 0-to-many).
Do not worry about how to obtain that at this point, because that has to do with the program, not the database, and the database is an independent, self-described and integrated storage for the data.
It looks like this:
All my data models are rendered in IDEF1X, the Standard for modelling Relational databases since 1993
My IDEF1X Introduction is essential reading for beginners.
You have one good database table OriginSurname
, and one horrible Student
file. Why ? Because while you have a Record ID
declared as Primary Key, which ensures the Record ID
is unique, you have nothing to keep the rows of data unique. The Relational Model demands that the logical rows (as opposed to records) in a table are unique.
This should fail, we want it to fail, but it will succeed:
INSERT Student VALUES( "John", "Smith" ) -- should succeed
INSERT Student VALUES( "John", "Smith" ) -- duplicate row: should fail
INSERT Student VALUES( "John", "Smith" ) -- duplicate row: should fail
For the columns that you have given, you need an index on ( Surname, Name )
. Now the rows (regardless of what the Record ID
is) are unique. A second John Smith
will now fail. It looks like this:
Finally ...
Does it make sense to declare StudentSurName in the table Students a foreign key [referencing the PK in the OriginSurname table]
Yes. Absolutely. That is how Referential Integrity is maintained in the database, by the database platform. Conversely, if you do not declare that Foreign Key, there will not be an integrity between OriginSurname.Surname
and Student.Surname
.
Now we will take up the considerations for the program that is required to maintain the database, and the particular Consistency that we have defined in the database. That is, the Consistency is first defined in the database, and second, maintained by the relevant program components that interact with the database tables.
surname
cannot[must not] exist in the table OriginSurnames if there's no one that posseses it in the table "Students"
Agreed.
because the surname is inserted in the table OriginSurnames table after a student with that surname has been inserted in the table Students.
That is wrong thinking. In SQL, since 1981, we have had ACID Transactions. We do not write anything to the database via direct INSERT/UPDATE/DELETE
, because that would leave the database Inconsistent, especially if the program or the SQL platform, or the computer system, fails. All writes are properly packaged, and written together, as a Transaction.
First understand that the database must be Consistent, before; during; and after the writes to the database. That is the C in ACID.
In our definition, which is correct, OriginSurname
is the parent row, and we cannot have a child Student
without a parent.
Therefore when a new Surname
is added to Student
, it must be INSERTed
into the parent OriginSurname
before the INSERT
into Student.
Likewise, when a Student is DELETEd
, if it is the last occurrence of that Surname
, the DELETE Student
must be followed with a DELETE
OriginSurname.
The A stands for Atomic. The idea is, a Transaction consists of more than one write to the database, as does ours, and the Transaction is applied completely, or not at all. Hence Atomic. The Transaction is bracketed with BEGIN TRAN
and COMMIT TRAN
(or ROLLBACK TRAN
if the code is required to exit a partially applied Transaction).
The Student_Add
Transaction will be as follows. I am providing the skeleton code, you need to add error checking, etc:
BEGIN TRAN IF NOT EXISTS ( SELECT 1 FROM OriginSurname -- test if parent not exists WHERE Surname = @Surname ) INSERT OriginSurname -- insert parent first VALUES( @Surname, @Origin ) INSERT Student -- insert child second VALUES( @Name, @Surname ) COMMIT TRAN
The Student_Drop
Transaction will be as follows:
BEGIN TRAN DELETE FROM Student -- delete child first WHERE Name = @Name AND Surname = @Surname IF NOT EXISTS ( SELECT 1 FROM Student -- test if parent obsolete WHERE Surname = @Surname ) DELETE FROM OriginSurname -- delete obsolete parent second WHERE Surname = @Surname COMMIT TRAN
The "before" or "after" in your comments simply does not apply, because we are executing an Atomic Transaction, within which we arrange the SQL verbs to match the definition of the database, and we maintain database Consistency throughout.
To be complete on ACID Transactions:
The I stands for Isolation.
During the Transaction, ie. between the BEGIN TRAN
and the COMMIT TRAN
, the changes to the database that are being written, are not complete. We do not want other active database users to see this incomplete data, precisely because the Transaction is Atomic, they must continue to see only Consistent data. Therefore the Transaction will be executed by the platform in Isolation from the other users. Nothing for you to do, except to understand that that is how it happens. It is during this tiny period that resources are locked, and concurrency is an issue (to the extent that it has not been mitigated by OLTP Standards).
The D stands for Durable.
Every Transaction must be durable, ie. once the COMMIT
has been executed, the Transaction must persist in the database, it must not be lost due to a system crash, or a bomb destroying the building. That is a facility that the SQL platform provides, in and of itself, nothing for you to do as a coder. Each platform provides various recovery facilities, and the DBA has to set them up, to the degree that the business requires recovery from a hard crash.
Do not use triggers. They cause many more problems than they allege to fix. More importantly, if you are using Transactions as advised (and no direct writes to the database), there is no need for them.
In the first three decades following the advent of the Relational Model, and SQL its data sub-language, all SQL platforms were real platforms (Server Architecture; full compliance with the SQL Standard; variety of recovery and Durability facilities; etc). ACID Transactions have been in the SQL Standard since its first release in 1981.
In the 00's, various pretend-SQL suites became available as freeware; shareware; vapourware; noware; etc. These are mickey mouse suites (we cannot call them platforms), with no Server Architecture; no decent backup or recovery; and no compliance with the SQL Standard. Which means their use of the term SQL is incorrect. At best they are pretend-SQLs.
PostgreSQL has no ACID Transactions. As a sort-of; kinda; looks-like feature, the Functions are declared to be "transactional". Therefore the Standard SQL BEGIN/COMMIT/ROLLBACK TRAN
verbs are missing, and you have to place that code in a Function. And hope there are not too many online users.
MySQL does have transactions, but they are not SQL compliant, ie. they are not true ACID. It should be adequate for the simple kind of work you are doing. The MySQL verb is BEGIN WORK
. And you have to set the table to TYPE = InnoDB
.