Search code examples
database-normalization

How to normalize a doctor table to follow 2NF?


There is a base table called doctor in my database where I have the columns

Name, d_ID, Age, Gender, Contact_NO, Speciality, beg_Date, End_Date

I wish to normalize my table. I have found the dependencies for doctor table as follows:

Name, d_ID ---> Age, gender, Speciality
d_ID----> Name, Contanct_NO, Beg_Date, End_Date

There are a few more base tables with a similar structure.

I have computed the closures and found that I have 2 candidate keys which are {d_ID} and {Name,d_ID}. I chose {d_ID} to be the primary key and {Name,d_ID} to be the secondary key.

My question is:

  1. I want to know if my table is in 2NF already. If not, please let me know how to break down the relation?

  2. I have an intermediate table called patient_record which has, doctor id, patient id, nurse id, bed id (foreign key) and so on.My confusion lies where, if normalization has to be only done to the intermediate tables and not the other base tables. I believe this, because the base tables would only have unique identifiers for their columns and hence they would automatically fall under 2NF?


Solution

  • i computed the closures and found that i have 2 candidate keys which are {d_ID} and {Name,d_ID} (Please correct me if i am wrong).

    No. By definition, candidate keys are irreducible. If d_ID is a candidate key, then {Name, d_ID} is not. {Name, d_ID} is not a candidate key, because it's reducible. Drop the attribute "Name", and you've got a candidate key (d_ID).

    1) i want to know if my table is in 2NF already. If not, please let me know how to break down the relation?

    It's really hard to say in this case. Although you have a unique ID number for every doctor, in your case it only serves to identify a row, not a doctor. Your table allows this kind of data.

    d_ID   Name         Age  Gender  Contact_NO     Speciality   beg_Date    End_Date
    --
    117    John Smith   45   M       123-456-7890   Cardio       2013-01-01  2015-12-31
    199    John Smith   45   M       123-456-7890   Cardio       2013-01-01  2015-12-31
    234    John Smith   45   M       123-456-7890   Cardio       2013-01-01  2015-12-31
    

    How many doctors are there? (I made up the data, so I'm really the only one who knows the right answer.) There are two. 234 is an accidental duplicate of 117. 199 is a different doctor than 117; it's just a coincidence that they're both heart specialists at the same hospital, and their hospital privileges start and stop on the same dates.

    That's the difference between identifying a row and identifying a doctor.

    Whether it's in 2NF depends on other functional dependencies that might not yet be identified. There might be several of these.

    2) i have an intermediate table called patient_record which has the doctor id, patient id, nurse id, bed id (foreign key)and so on. i am confused if normalization has to be only done to intermediate tables and not the other base tables.

    Normalization is usually done to all tables.

    Because base tables would only have unique identifiers for the columns and hence they would automatically fall under 2NF?

    No, that's not true. For clarification, see my answer to Learning database normalization, confused about 2NF.


    Identifying a row and identifying a thing

    It's a subtle point, but it's really, really important.

    Let's look at a well-behaved table that has three candidate keys.

    create table chemical_elements (
      element_name varchar(35) not null unique,
      symbol varchar(3) not null unique,
      atomic_number integer not null unique
    );
    

    All three attributes in that table are declared not null unique, which is the SQL idiom for identifying candidate keys. If you feel uncomfortable not having at least one candidate key declared as primary key, then just pick one. It doesn't really matter which one.

    insert into chemical_elements
    (atomic_number, element_name, symbol)
    values
    (1, 'Hydrogen', 'H'),
    (2, 'Helium', 'He'),
    (3, 'Lithium', 'Li'),
    (4, 'Beryllium', 'Be'),
    [snip]
    (116, 'Ununhexium', 'Uuh'),
    (117, 'Ununseptium', 'Uus'),
    (118, 'Ununoctium', 'Uuo');
    

    Each of the three candidate keys--atomic_number, element_name, symbol--unambiguously identifies an element in the real world. There's only one possible answer to the question, "What is the atomic number for beryllium?"

    Now look at the table of doctors. There's more than one possible answer to the question, "What is the ID number of the doctor named 'John Smith'?" In fact, there's more than one possible answer for the very same doctor, because 234 and 117 refer to the same person.

    It doesn't help to include more columns, because the data is the same for both doctors. You can get more than one answer to the question, "What's the ID number for the 45-year-old male doctor whose name is 'John Smith', whose phone number is 123-456-7890, and whose specialty is 'Cardio'?"

    If you find people making appointments for these two doctors, you'll probably find their ID numbers written on a yellow sticky and stuck on their monitor.

    • Dr. John Smith who looks like Brad Pitt (!), ID 117.
    • Other Dr. John Smith, ID 199.

    Each ID number unambiguously identifies a row in the database, but each ID number doesn't unambiguously identify a doctor. You know that ID 117 identifies a doctor named John Smith, but if both John Smiths were standing in front of you, you wouldn't be able to tell which one belonged to ID number 117. (Unless you had read that yellow sticky, and you knew what Brad Pitt looked like. But that information isn't in the database.)

    What does this have to do with your question?

    Normalization is based on functional dependencies. What "function" are we talking about when we talk about "functional dependencies"? We're talking about the identity function.