database database-design database-normalization

First Normal Form Ambiguity?

In the attached image, 1NF for the data is given as

Pet [ pet_id, pet_name, pet_type, pet_age, owner ]
Pet_Visit [ pet_id, visitdate, procedure_no, procedure_name ]

Where the table name precedes a list of fields with primary keys in bold.

However, I wonder if

Pet [ pet_id, pet_name, pet_type, pet_age, owner, visitdate, procedure_no, procedure_name ]

would also qualify, since it seems to meet the requirements of "Each record should be uniquely identified by a key" and "each cell can contain only one item"? Or perhaps adding a visit_id field but still keeping one table, as in

Pet [ pet_id, pet_name, pet_type, pet_age, owner, visit_id, visitdate, procedure_no, procedure_name ] ?

More generally, does 1NF usually require splitting a table with "repeating groups" into more than one table, or can we technically keep it all in one table if a composite key can be found/created?

I've seen many examples with apparently varying definitions. I'm not looking for advice on how I should approach learning this topic, just answers to the questions I have asked. Thanks in advance.

Solution

You are correct thinking that keeping all the data together can still be qualified as 1NF. Splitting the data is not a necessary condition to have the 1NF.

In fact, this is exactly what differentiate generic “tables” (like those of Excel and similar programs), from “relations”, which is the basic data structure of relational model, structures managed by Database Management Systems, and for which the different normal forms (like 2NF, 3NF, Boyce-Codd Normal Form, etc.) have been “invented”: a relation is like a table, with three very important differences:

Each cell has one and only one value
The order of the rows is not important
The rows are all different

Conditions 2 and 3 means that we are in presence of a set, and note that it is not necessary to have a key (or candidate key, as it is usually called) formed by a strict subset of the attributes: simply a candidate key could be composed by all the attributes, also in this case it is true that all the rows are different.

So, passing from “tables” to “relations” means that all the above three conditions are true, and allow the building of the so-called “normalization theory”, which, even if “theory”, has important practical consequences on data management techniques.

Finally, note that for some people the 1NF can be properly defined only when we are already talking about relations, i.e. sets, so, it reduces to the first condition (each cell has one and only one value). So, when a relation is not in 1NF, one can talk of non-1NF relations, where each datum can have multiple values. This is not really important, since in the normalization theory the “important” normal forms are the Third, the Boyce-Codd’s, and higher ones.