database-design modeling missing-data representation

Handling missing values in a database schema?

Suppose I have the following table schema

MONTH VARCHAR(10)
HASRAIN BOOLEAN
YEAR INTEGER

Now suppose for a particular year, I have values for 11 months, but no value for the the left over month (let's call it September).

Is it more correct to:

Put a record in with a zero value, or
Leave no record for that month

My question is: How do you model the absence of information in a database schema?

Solution

If information about an entity is not needed to understand another entity, these are not cognitively dependant on each other and can be normalised.

What this means in general practice is that you should make separate tables for the two entities and use foreign keys to refer between them.

Imagine this:

Table weather_month:
------------
month ENUM('Jan', 'Feb', Mar' ...) NOT NULL,
year mediumint NOT NULL,
weather_id mediumint,
PRIMARY KEY(month, year)

Table weather:
------------
weather_id mediumint NOT NULL
rain BOOLEAN NOT NULL,
clouds ENUM('Clear', 'Sparse', 'Thick', 'Cumulus', ...) NOT NULL,
temperature ENUM('Freezing', 'Thawing', 'T-shirt weather', 'Hot', ...) NOT NULL

This example shows how we either know what the weather is, or we don't.

If you would put lots and lots of nullable fields into the weather_month table (as is quite a quite common way of doing this), it would not be as clear as in this setup that you have either observed the weather or you did not.

Relational databases use NULL for "I have no value", but you should really consider how you use it, because the database design you make should speak for your data structure and if it speaks for itself, you won't have to explain it to people. Saves you time.