Search code examples
phpmysqlsqlinsertdatabase-performance

How to INSERT tags in three-table system


Although some major systems like Joomla store tags as comma-separated text in the main article database, normalized system of three tables as article, tags and tag-relationship is preferred (as others like Wordpress uses). There are lots of discussions and questions about structure and reading; but I was unable to find the best INSERT command, as we need to insert into three tables. How to quickly run this process through one SQL run? Or we need to first insert article, then each tags, and finally writing the relationships?

Another question is about the uniqueness of the tags. The main advantage of this system is that we only need to store each term only once (then connecting to corresponding articles). Is it practical to use mysql UNIQUE to avoid duplication? Or (as I read somewhere) we need to read the entire list of tags as an array to find any duplication to catch the tag ID and avoid storing the term?

Will the whole process as three individual steps:

  1. INSERT the article
  2. INSERT tags with UNIQUE but regardless of their relationship
  3. Finding each tag ID and make a relationship to the article ID

Am I right? The reason that I asked is that I saw people catch the tags as an array and make a comparison. To me it is very slow, and kills the performance, particularly for UPDATE.


Solution

  • You can only ever insert in one table at a time.

    One solution is to use triggers, the other is to use a transaction.
    The first can be used with any engine, the latter requires InnoDB or alike engine.

    Make sure you put a UNIQUE index on the field tag.name.

    1-Using transactions

    START TRANSACTION;
    
    INSERT IGNORE INTO tag (name) VALUES ('$example1', '$example2');
    INSERT INTO article (title, body) VALUES ('$title','$body');
    SET @article_id = LAST_INSERT_ID();
    INSERT INTO tag_link (tag_id, article_id) 
      SELECT t.id, @article_id FROM tag t WHERE t.name IN ('$example1','$example2');
    
    COMMIT;
    

    2-Using a trigger on a blackhole table

    Create a table of type blackhole with the following fields.

    title: varchar(255)
    body: text
    tag1: varchar(50) DEFAULT NULL
    tag2: varchar(50) DEFAULT NULL
    ...
    add as many tags as you want.
    

    Add a AFTER INSERT trigger to the blackhole table to do the actual storage for you.

    DELIMITER $$
    
    CREATE TRIGGER ai_bh_newacticle_each AFTER INSERT ON bh_newacticle FOR EACH ROW
    BEGIN
      INSERT IGNORE INTO tag (name) VALUES (new.tag1, new.tag2,...,new.tag25);
      INSERT INTO article (title, body) VALUES (new.title,new.body);
      SET @article_id = LAST_INSERT_ID();
      INSERT INTO tag_link (tag_id, article_id) 
        SELECT t.id, @article_id FROM tag t 
        WHERE t.name IN (new.tag1, new.tag2,...,new.tag25);
    END$$
    
    DELIMITER ;
    
    DELIMITER $$
    

    Now you can just insert the article with tags in a single statement:

    INSERT INTO bh_newarticle (title, body, tag1, tag2, tag3) 
      VALUES ('$title','$body','$tag1','$tag2','$tag3');
    

    Back to your question

    Am I right? The reason that I asked is that I saw people catch the tags as an array and make a comparison. To me it is very slow, and kills the performance, particularly for UPDATE.

    Tags are only useful if there are i a limited number of them. If you put a (unique) index on tag.name looking for a tag will be very fast, even with 10.000 tags. This is because you are looking for an exact match. And if you are really in a hurry you can always make the tag table a memory table with a hash index on the name field.
    I doubt you need to worry about slowness in the tag lookup though.

    Just make sure you don't allow too many tags per article. 5 seems a good start. 10 would be too many.

    Links
    http://dev.mysql.com/doc/refman/5.0/en/create-trigger.html
    http://dev.mysql.com/doc/refman/5.0/en/blackhole-storage-engine.html