Search code examples
databasedb2redundancydatabase-optimization

which is better ? Reduncancy with faster access of data , or no redundancy and slower data access


i want to create a database for a forum website...

All the users of the forum website will be stored in a table named USERS with the following fields :

user_name
user_ID
(and additional details)

there will be a single table named FORUMS with the following fields :

forum_ID
forum_creatorID(which is the ID of one of the users)
forum_topic
replies
views

And for each forum(for each row in the FORUMS table) created, there'll be a separate table which is named as "forum_ID"_replies , where the exact forum_ID of that forum will be replaced within the quotes...
thus, each forum will have a separate table where all the replies for that particular forum will be saved...

the fields in the "forum_ID"_replies table are

user_ID
user_name
comment
timestamp(for the comment)

I hope i made my design clear... now, my doubt is

I saved user_name as one of the fields in each "forum_ID"_replies . But, i think the user_name can be referred(or accessed) from USERS table using the user_ID , instead of storing it in each "forum_ID"_replies table. In this way, redundancy is reduced.

But, if user_name is stored in each table, the search for user_name will be reduced , and result can be displayed faster.

Which is more optimal ?

Storing names along with their IDs for faster access, or storing only the IDs to avoid reduncancy ?


Solution

  • "Optimal", "better" etc. are all subjective.

    Most database designers would have several problems with your proposal.

    Database normalization recommends not duplicating data - for good reason. What happens if your user changes their username? You have to update the user table, but also find all the "forum_id"_replies tables where their username occurs; if you mess that up, all of a sudden, you have a fairly obvious bug - people think they're replying to "bob", but they're actually replying to "jane".

    From a performance point of view, unless you have esoteric performance demands (e.g. you're running Facebook), the join to the user table will have no measurable impact - you're joining on a primary key column, and this is what databases are really, really good at.

    Finally, creating separate tables for each forum is not really a good idea unless you have huge performance/scalability needs (read: you're Facebook) - the additional complexity in maintaining the database, building queries, connecting your apps to the database etc. is significant; the performance overhead of storing multiple forums in a single table usually is not.

    "Better" depends on your criteria. If (as you write in the comments) you are concerned about scalability and supporting huge numbers of posts, my recommendation is to start by building a way of testing and measuring your scalability levels. Once you can test and measure, you can test different solutions, and know whether they have a material impact - very often, this shows counter-intuitive outcomes. Performance optimizations often come at the expense of other criteria - your design, for instance, is more error prone (repeated information means you can get discrepancies) and more expensive to code (writing the logic to join to different tables for each forum). If you can't prove that it has a material benefit in scalability, and that this benefit meets your business requirements, you're probably wasting time & money.

    You can use tools like DBMonster to populate your database with test data, and JMeter to run lots of concurrent database queries - use those tools to try both solutions, and see if your solution is, indeed, faster.