Search code examples
database-designfunctional-dependencies

How do you determine functional dependencies and a primary key?


In my Oracle Database Programming course, the first part of our final lab assessment requires that we:

  • Identify the primary key of the table as it is currently shown.
  • Find all functional dependencies of the table.
  • Draw the dependency diagram for the table.

The table is in 1NF.

I need to combine every possible FD, which would not only consume a very large amount of time but seems bizarre considering they want us to then map these relationships in the dependency diagram. This would cause everything to link to everything - and this is why I believe I do not understand functional dependencies.

A functional dependency says that in R, X->Y, where Y is not produced by anything other than X, and enables you to determine every other value in the table.

X and Y can consist of more than one attribute. But the number of dependencies I would come up with is astounding, and I don't want to waste time doing something the wrong way.

Do I provide all fully functional dependencies, partial dependencies, and transitive dependencies?

My table consists of 10 columns in 1NF - thus, A-J would be my attributes. R(AD) constitutes a primary key, but I don't know if I need to derive the PK from laying out all of the FD's, or if I can choose a PK and find my FD's from that.

Do I lay out every FD, given that my PK will really determine the mapping of the relationships within the design?

How do you determine functional dependencies and a primary key?

https://www.dropbox.com/s/3vwo1axe7a1i20s/final%20lab%20instructions.pdf?dl=0


Solution

  • You must know what the tables mean and what the business rules are.

    From a question of yours apparently re the same design, Identifying functional dependencies (FDs):

    Parentheses identify the primary key attributes:

    (Student ID), Student Name, Student Address, Student Major, (Course ID), Course Title, Instructor ID, Instructor Name, Instructor Office, Student_course_grade

    Apparently, the table (using shortened column names) means something like:

    /*
    student with id [si] and name [sn] and address [sa] and major [sm]
        takes course [ci] with title [ct]
        from instructor with id [ii] and name [in] and office [io]
        with grade [scg]
    */
    t(si,sn,sa,sm,ci,ct,ii,in,io,scg)
    

    A table comes with such a (characteristic) predicate (aka membership criterion, aka meaning)--a fill-in-the-(named-)blanks statement template parameterized by column names. Plugging a row of values for the column names into the predicate gives a proposition--a true-or-false statement. A table (base table or query result) holds the rows that make a true proposition from its predicate.

    Finding appropriate predicates for recording just what your business needs is what database design is about. A predicate might be expressible as the AND/conjunction of a other smaller predicates. Decomposing it into them when helpful is what database normalization is about.

    You must use a table's predicate/criterion/meaning and the business rules to figure out for every subset of columns when a given value for it always appears with one value for another column, i.e. what FDs (functional dependencies) hold. We can talk about FDs holding in a table value or in a table variable/schema. Similarly for superkeys, CKs (candidate keys), PKs (primary keys) and other constraints. A FD holds in a variable/schema iff/when it holds in every possible value/state that can arise for that variable/schema given the business rules.

    Note that if a subset determines a column then all supersets of that column determine that column. You can also rule out FDs when you know that a given subset can appear with more than one value of a column. Example data might suggest FDs hold or show they don't hold. Learn about the "transitive closure" and "minimal cover" of a set of FDs to express all FDs concisely. You must show that you have accounted for every possible determinant, i.e. for every subset of columns, and for every attribute, whether the FD holds having that set determining that attribute.

    Every time you decompose you have to decide whether you want a given component only ever to hold the projection of rows of the original on its columns or whether it could hold other rows as well. E.g. can a professor exist without teaching a course? If so it means that your original predicate/table is not actually sufficient to say all the things that you want to about your application and you need an extra table that is not a projection of it but has the same columns. This is often described as part of normalization but it isn't. It is noticing that your design was wrong while you are normalizing.

    You may have to hypothesize because you don't know what situations can arise. We normally assume ids are 1:1 with identified entities; is that so? Can a student have more than one name? If not then {si} -> sn. Even if they do, is the table going to only record one? If so, the predicate becomes "... and has name [sn] that has been chosen to be recorded..." and all your queries that involve student names are going to involve those that "have been chosen to be recorded" and the above FD holds.

    • Only one class is taught for each course ID.
    • Students may take up to 4 courses.
    • Each course may have a maximum of 25 students.
    • Each course is taught by only one Instructor.
    • Each student may have only one major.

    What's a class? Are there subrows of values that are 1:1 with classes, so that what goes for classes goes for those subrows?

    Does ci determine ct? Does ii determine in? From the rules, "Each course is taught by only one instructor", and the table holds the rows where "... takes course [ci] ... from instructor with id [ii] ...". So a given ci only appears with one ii. FD. From the rules, "Each student may have only one major", and the table holds the rows where "student with id [si] ... and major [sm] ...". So a given si only appears with a single sm. FD. From the rules, "Students may take up to 4 courses" and "Each course may have a maximum of 25 students", and the table holds the rows where "student with id [si] ... takes course [ci] ...". So si does not determine ci and course does not determine si.

    Unanswered questions must be resolved between you and your client (professor).

    You can only determine CKs, one of which you can call the PK, and other superkeys--unique column sets--from FDs. If you are given CKs, PK and/or other superkeys, by definition they functionally determine every column.

    (Your assignment uses E-R so you need to find predicates about entities, then find corresponding ones about column sets that identify entities, which get tables.)

    From my answer to your other question:

    For every subset of columns you need to decide which other columns can only have one value for a given subrow value for those columns. When it can only have one we say that the subset of columns functionally determines that column. We say that there is a FD (functional dependency) columns->column. This is when we can express the table's predicate as "... AND column=F(columns)" for some function F. (F is represented by the projection of the table on the column & columns.) But every superset of that subset will also functionally determine it, so that cuts down on cases. Conversely, if a given set does not determine a column then no subset of the set does. Applying Armstrong's axioms gives all the FDs that hold when given FDs hold. (Algorithms & software are available to apply them & determine FD closures & covers.) Also, you may think in terms of column sets being unique; then all other columns are functionally dependent on that set. Such a set is called a superkey.

    The reason why applying the business rules and table predicates to find whether a FD holds only needs to be done to the FDs that determine a single attribute is that Armstrong's axioms will afterwards find which FDs hold that determine more than one.