database relational-database functional-dependencies

Determining Candidate Keys from Functional Dependencies

If I Have R(E, F, G, H), what would be the candidate keys from these functional dependencies?

    FD1: EF -> G
    FD2: EF -> H
    FD3: G -> E
    FD4: H -> F

My thought process was that EF would be considered a candidate key, since EF -> G and EF -> H, therefore EF+ = {E, F, G, H}. Could I say the same in saying that GH is also a candidate key, since G -> E, H -> F, therefore GH -> EF and GH+ = {E, F, G, H}? Would there be any other candidate keys?

Solution

The schema has four candidate keys: EF, EH, FG, GH. You can easily verify this fact by computing the closure of each pair of attributes, and noting that it contains all the attributes.

The question is naturally how to find them. The trivial method is simply to try the closure of all the subsets of attributes of the relation, but this is obviously inefficient, being an exponential process.

There are more efficient algorithms to find all the candidate keys, but they are quite complex. There are simple heuristics that can help in reducing the complexity of the solution, without using a formal algorithm.

First, you should start from a canonical cover, otherwise these heuristics cannot be applied (in your example you have already a canonical cover). The first step is that you can exclude any attribute that appears only in the right hand sides of the dependencies (not in this case), and consider that all the attributes appearing only in left hand sides must be always part of any key (also not in this case).

Then, you can start from the left hand sides of the dependencies, and compute their closures to see if those sets of attributes can determine all the others. If this is not the case, you can add the other attributes, one at time, and again compute the closure of the resulting set, stopping considering those attributes when you have found a key or the set includes a subset already considered.

For instance, from EF you have found that you can determine all the other attributes, so this is a candidate key. Then, considering G, you can add E, noting that EG+ = EG, so this is not a candidate key, then add H, noting that GH+ = EFGH, so this is a candidate key, and finally add F, finding that FG is a candidate key. Of course, when a set of attributes is a candidate key you do not add to it other attributes. Another set of tests starts with H, first HE (which produces a candidate key), then HF, which do not produce a candidate key. At this point we should check if adding an attribute to EG or to HF we obtain a candidate key, but we can safely stop here since we will obtain just a superset of a set already considered (like EGF, for instance, that contains GF).