Search code examples
sqlmysqloptimizationnormalization

Is it really worth it to normalize the "Toxi" way? ( 3NF )


I'm in the early stages of my database design so nothing is final yet, and I'm using the "TOXI" 3-table design for my threads which have optional tags, but I can't help but feel that the joining is not really necessary and perhaps I need to just rely on a simple tags column in my posts table where I can just store a varchar of something like <tag>, <secondTag>.

So to recap:

  • is it worth the trouble of the extra left joins on the 2 tag tables instead of just having a tag column in my posts table.
  • is there a way I can optimize my query?

Schema

CREATE TABLE `posts` (
    `post_id` INT UNSIGNED PRIMARY AUTO_INCREMENT,
    `post_name` VARCHAR(255)
) Engine=InnoDB;

CREATE TABLE `post_tags` (
    `tag_id` INT UNSIGNED PRIMARY AUTO_INCREMENT,
    `tag_name` VARCHAR(255)
) Engine=InnoDB;

CREATE TABLE `post_tags_map` (
    `map_id` INT PRIMARY AUTO_INCREMENT,
    `post_id` INT NOT NULL,
    `tags_id` INT NOT NULL,
    FOREIGN KEY `post_id` REFERENCES `posts` (`post_id`),
    FOREIGN KEY `post_id` REFERENCES `post_tags` (`tag_id`)
) Engine=InnoDB;

Sample Data

INSERT INTO `posts` (`post_id`, `post_name`)
  VALUES
(1, 'test');

INSERT INTO `post_tags` (`tag_id`, `tag_name`)
  VALUES
(1, 'mma'),
(2, 'ufc');

INSERT INTO `posts_tags_map` (`map_id`, `post_id`, `tags_id`)
  VALUES
(1, 1, 1),
(2, 1, 2);

Current query

SELECT 
    posts.*,
    GROUP_CONCAT( post_tags.tag_name order by post_tags.tag_name ) AS tags

  FROM posts
    LEFT JOIN posts_tags_map
      ON posts_tags_map.post_id = posts.post_id
    LEFT JOIN post_tags
      ON posts_tags_map.tags_id = posts_tags.tag_id

  WHERE posts.post_id = 1
  GROUP BY post_id

Result

IF there are tags:

post_id     post_name        tags
1             test           mma, ufc

Solution

  • Having all tags in different records (normalized) means that you'll be able to rename the tags more easily should the need arise and track the tag name history.

    SO, for instance, renamed SQL Server related tags at least thrice (mssql -> sqlserver -> sql-server).

    Having all tags in one record (denormalized) means that you can index this column with a FULLTEXT index and search for posts having two or more tags at once:

    SELECT  *
    FROM    posts
    WHERE   MATCH(tags) AGAINST('+mma +ufc')
    

    which is possible too but less efficient with normalized design.

    (Don't forget to adjust @ft_min_word_len to index tags of 3 characters or less for this to work)

    You can combine both designs: store both the map table and the denormalized column. This will require more maintenance, though.

    You can also store the normalized design in your database and use the query you provided to feed the tags to Sphinx or Lucene.

    This way, you can do history digging with MySQL, fulltext tag searches using Sphinx, and no extra maintenance will be required.