Search code examples
mysqlrdbmsdatabase-partitioningdata-partitioning

Determining partitioning key in range based partitioning of a MySQL Table


I've been researching for a while regarding database partitioning in MySQL. Since I have one ever-growing table in my DB, I thought of using partitioning as an effective tool to optimize it. I'm only interested in retaining recent data (say last 6 months) and the table has a column name 'CREATED_AT' (TIMESTAMP, NON-PRIMARY), the approach which popped up in my mind is as follows

  • Create a time-based range partition on the table by using 'CREATED_AT' as the partition key.
  • Run a DB level Event periodically and drop partitions which are obsolete. ( older than 6 months).

However, the partition can only be realized if I make 'CREATED_AT' field as primary. But doesn't it violate the primary key principle? since the same field is non-unique and can have tons of rows with the same value, doesn't marking it as primary turn out to be an anti-pattern? Is there any workaround to acheive time based ranged partitioning in this scenario?


Solution

  • This is a problem that prevents many MySQL users from using partitioning.

    The column you use for your partitioning key must be in every PRIMARY KEY or UNIQUE KEY of the table. It doesn't have to be the only column in those keys (because keys can be multi-column), but it has to be part of every unique key.

    Still, in many tables it would violate the logical design of the table. So partitioning is not practical.

    You could grit your teeth and design a table with partitions that has a compromised design:

    create table mytable (
     id bigint auto_increment not null,
     created_at datetime not null,
     primary key (id, created_at)
    ) partition by range columns (created_at) (
     partition p20190101 values less than ('2019-01-01'),
     partition p20190201 values less than ('2019-02-01'),
     partition p20190301 values less than ('2019-03-01'),
     partition p20190401 values less than ('2019-04-01'),
     -- etc...
     partition pMAX values less than (MAXVALUE)
    );
    

    I tested this table and there's no error when I define it. Even though this table technically allows multiple rows with the same id value if they have different timestamps, in practice you can code your application to just let id values be auto-incremented, and never change the id. As long as your code is the only application that inserts data, you can more or less have some assurance that the data doesn't contain multiple rows with the same id.

    You might think you can add a secondary unique key constraint to enforce that id must be unique by itself. But this violates the partitioning rules:

    mysql> alter table mytable add unique key (id);
    ERROR 1503 (HY000): A UNIQUE INDEX must include all columns in the table's partitioning function
    

    You just have to trust that your application won't insert invalid data.

    Or else forget about using partitioning, and instead just add an index to the created_at column, and use incremental DELETE instead of using DROP PARTITION to prune old data.

    The latter strategy is what I see used in almost every case. Usually, it's important to have the RDBMS enforce strict uniqueness on the id column. It's not safe to allow this uniqueness to be unenforced.


    Re your comment:

    Isn't dropping of an entire partition a much cheaper operartion than performing incremental deletes?

    Yes and no.

    DELETE can be rolled back, so it results in some overhead, like temporarily storing data in the rollback segment. On the other hand, it locks only the rows that match the index search.

    Dropping a partition doesn't do rollback, so there are some steps it can skip. But it does an ALTER TABLE, so it needs to first acquire a metadata lock on the whole table. Any concurrent query, either read or write, will block that and be blocked by it.

    Demo:

    Open two MySQL client windows. In the first session do this:

    mysql> START TRANSACTION;
    mysql> SELECT * FROM mytable;
    

    This holds a metadata lock on the table, which blocks things like ALTER TABLE.

    In the second window:

    mysql> ALTER TABLE mytable DROP PARTITION p20190101;
    <pauses, waiting for the metadata lock held by the first session!>
    

    You can even open a third session and do this:

    mysql> SELECT * FROM mytable;
    <also pauses>
    

    The second SELECT is waiting behind the ALTER TABLE. They are both queued for the metadata lock.

    If I commit the first SELECT, then the ALTER TABLE finally finishes:

    mysql> ALTER TABLE mytable DROP PARTITION p20190101;
    Query OK, 0 rows affected (6 min 25.25 sec)
    

    That 6 min 25 sec isn't because it takes a long time to do the DROP PARTITION. It's because I had left my transaction uncommitted that long while writing this post.

    Metadata lock waits don't time out like an InnoDB row lock, which times out after 50 seconds. The default metadata lock timeout is 1 year! See https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_lock_wait_timeout

    Statements like ALTER TABLE, DROP TABLE, RENAME TABLE, and even things like CREATE TRIGGER need to acquire a metadata lock.

    So in some cases, depending on if you have long-running transactions holding onto metadata locks, it could be better for your concurrent throughput to use DELETE to remove data incrementally, even if it takes longer.