Search code examples
databasedatabase-designdatabase-normalization

Second Normal Form/Normalisation confusion: Primary Key? Compound Key?


I am making a database plan, and I am a little confused on Normalising to the Second Normal Form. I am also worried about the large number of columns, and I can't rightly figure out what to do with them.

This is my Table that I am focusing on MatchDetails:

Idea 1

Attempt with Compound Key

img link

The Player_ID is a Unique Primary Key for another table Users. MatchID is a Unique Primary Key for the table Matches. The relationship between Matches and Players is many to many.

Would this work as a compound key? In the sense that 1 player can only have taken part in a particular Match once? Do the columns to the right of MatchID Have a functional dependency on the Compound key, in the sense that they are unique TO that Compound key?

Idea 2

Table with Primary Key img link

In this example, the Participation_ID is a Unique Primary Key for the table, since there can be multiple instances of the same Player_ID and the same MatchID for various combinations of Players and Matches.

In this example, I would guess that this column is in Second Normal Form because there is only one Primary Key, and the Match Values are unique, and are thus functionally dependent? I am a little confused on Functional Dependency despite trying to read about it here.

Oh and another small thing...

The final thing that I am a little in doubt about, is the huge number of columns. All of the information to the right of MatchID are details about HOW the player (Player_ID) performed in the match (MatchID). Should they be in another table?

Link to other tables if you would like to see the layout so far: https://i.sstatic.net/5GIUG.png

Please ignore that MatchID doesn't have an underscore and the other ID's do, It's only an excel plan!


Solution

  • Unless the same player can participate in the same match more than once, you'll have to have a composite key {Player_ID, MatchID}, whether you add another key (such as {Participation_ID}) or not.

    Adding {Participation_ID} key only makes sense if you have some other tables that reference it1 and you want to make their foreign keys slimmer, or if you use a particularly hostile ORM that requires a non-composite primary key.

    Do the columns to the right of MatchID Have a functional dependency on the Compound key

    Yes.

    You can think of a "functional dependency" simply as a way for saying that the relation (a set of tuples) is a function. For a relation to be function (in the mathematical sense of that word), it must always produce same "result" for same "arguments".

    If the attributes of the given key are the "arguments" and the rest of the attributes are the "result", then no two different results can ever be produced from the same arguments, simply because the key is unique and therefore any particular combination of values of key attributes2 cannot identify more than one tuple.

    So all attributes are always functionally dependent on the key. That is always true for any key, otherwise it wouldn't be a key.

    The only question is whether some non-key attribute is also dependent on the proper subset of the key attributes. If it is, you have violated the 2NF.3

    In your case, if any of the attributes depends on Player_ID alone (or MatchID alone), that would violate the 2NF.

    The final thing that I am a little in doubt about, is the huge number of columns. All of the information to the right of MatchID are details about HOW the player (Player_ID) performed in the match (MatchID). Should they be in another table?

    Looks like they are where they need to be from the logical standpoint. It is unlikely, but possible, that you might have some physical reasons for vertically partitioning the data.4

    Some unrelated suggestions:

    • Use consistent naming: if there is Player_ID, there should be Match_ID, not MatchID (or vice-verse). Whops, I missed your last sentence.
    • Use singular for table names, for the same reason singular is typically used for class names in OOP.

    1 Which you don't as far as I can see.

    2 Aka. "prime" attributes. Strangely enough, a prime attribute does not have to belong to a "primary" key (it can belong to an alternate key), so just saying "key attributes" is probably a better terminology, IMHO.

    3 Obviously, this is only a concern for composite keys, because if a key has only one attribute, its proper subset is empty.

    4 DBMSes can typically handle hundreds or even thousands of columns these days, and this doesn't really qualify as "huge number of columns".