Search code examples
mysqlscalabilitypartitioning

Mysql 'Partitioning' vs Splitting data into different tables


We have a mysql table called posts_content.

The structure is as follows :

CREATE TABLE IF NOT EXISTS `posts_content` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `post_id` int(11) NOT NULL,
  `forum_id` int(11) NOT NULL,
  `content` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=79850 ;

The problem is that the table is getting pretty huge. Many giga-bytes of data ( we have a crawling engine ).

We keep inserting data into the table on a daily bases but seldom do we retrieve the data. Now as the table is getting pretty huge its getting difficult to handle the table.

We discussed two possibilities

  1. Use MySQL's partitioning feature to partition the table using the forum_id ( there are about 50 forum_ids so there would be about 50 partitions. Note that even each partition if made so will eventually grow to again many giga-bytes of data maybe even eventually need its own drive
  2. Create separate tables for each forum_id and split the data like that.

I hope I have clearly explained the problem. WHat I need to know is which of the above two would be a better solution in the long run. What are the adv. dis adv. of both the cases.

Thanking you


Solution

  • The difference is that in the first case you leave MySQL to do the sharding, and in the second case you are doing it on your own. MySQL won't scan any shards that do not contain the data, however if you have a query WHERE forum_id IN(...) it may need to scan several shards. As far as I remember, in that case the operation is syncronous, e.g. MySQL queries one partition at a time, and you may want to implement it asyncronously. Generally, if you do the partitioning on your own, you are more flexible, but for simple partitioning, based on the forum_id, if you query only 1 forum_id at a time, MySQL partitioning is OK.

    My advice is to read the MySQL documentation on partitioning, especially the restrictions and limitations section, and then decide.