Search code examples
phpmysqlnormalizationdenormalization

Should I make another table or just use arrays? (To normalize or not to normalize)


The current situation is that topics are sorted by 3 main categories. There is the potential to add more than just the 3 categories but the higher ups wish to implement the ability to add more than just 1 category to a topic.

My original db design has the the categoryID as a Foreign Key in the topic info table. Which was probably a bad idea from the start but I figured they were set on only having 3 categories and doing it this way would allow for fewer queries.

So from what I can see I have two options now: 1) Input the categoryID as a comma separated string that I parse on the php end. 2) Restructure the DB and pull out the categoryID into its own table of categoryID and topicID.

I was wondering what everyone thought of this. My first instinct was to restructure the database. But the first option when I think about it is the easiest to implement and least likely to break something existing by changing the db around. This can also lead to de-normalization however and open up the possibility of inconsistent data.

I have read the de-normalizing is fine so long as you accept the risk of having inconsistent data in exchange for performance. In your opinion will I gain much in performance for this risk? Any input on what I should do in this situation would be appreciated.

Thanks for the help,
Levi


Solution

  • Do not confuse denormalization (a good example of which is keeping the number of votes on SO question together with question as opposed to calculating it each time from 'votes' table) with the abomination that is comma-separated list of ids.

    Model a proper many-to-many relationship; there are just so many things that can (and will) go wrong with comma-separated approach. To name a few:

    1. No referential integrity.
    2. Next to impossible to use in joins.
    3. Impossible to adequately index; non-scalable.