database database-design database-schema database-normalization

problem related to atomicity in database table

I am creating a forum page for which i have created following database schema:

Forum(questionId, postedByUserId, questionSubject, questionBody, TagIds);

Tags(tagId, tagName);

Entries in Forum will be something like:

(1, 1, 'sample subject', 'sample body', '1 4 2') ...

And sample entries of Tags will be:

(1, 'C'), (2, 'C++'), (3, 'Java'), (4, 'Data Structure') ...

Now the problem is that first normal form says that all the fields should be atomic which is not satisfied in this case but i think space is saved as if i were creating a new table of forum_tag(questionId, tagId); then i think this will take more space on database, but would be correct conceptually.

So i don't know what should i do whether to do what i am doing right now or to make the coloumns atomic as per the normalization.

Please explain which is better and why because there are many cases when i found such problem but all the time i remain ambiguous that what should i do!

So please help.

Thanks in advance :)

Solution

Space is cheap in a database. Retrieval time, which varies with space, is much less cheap. however, retrieval time can also be affected by whether a keyed access strategy will work, and will be chosen by the query optimizer. The effect can be dramatic.

Consider the following retrieval oon the schema you proposed: find all the forum entries where one of the related tags is "4". For most DBMSes, this query will require a sequntial scan through the entire Forums table. Depending on data volume, this could be millions of disk I/O s.

Now consider a junction table

ForumTags (ForumId, TagId) primary key (ForumId, TagId)

Further, let's say that there's an index on TagId in addition to the automatic index on (ForumId, TagId)

The same query would result in a index lookup of value "4" in one of the indexes, and require as few as a dozen disk I/Os.

One of the goals of normalization is keyed access to all data. first normal form is pursuant to that goal.

I've had real life situations where a first normal form or better schema can be compared to a schema with embedded lists. The speed difference in those cases was something like 50 to 1.