Search code examples
mysqlinnodbmysql-5.7

Is partitioning still needed if the tables are already sharded in mysql?


Background: MySQL 5.7, InnoDB Engine

We have a huge transaction log table (6 million new rows per day) and we want to shard this huge table into multiple mysql instances and tables (so we have table_001, table_002, etc). Users often query their recent rows which means recent rows are hot while old rows are cold.

I'm wondering if mysql partitions are still needed in each sharded table (table_001, table_002, etc) in this case. My opinion is NO. If we already sharded the data into many sub-tables, why we need another kind of sub-tables by using partitions? (Partitions are actually separate tables internally right?)


Solution

  • There are 4 ways to split up a table:

    • "Sharding" -- some rows on each of several servers. This is useful for 'write scaling'. (Seems not applicable to you.)
    • "Partitioning" -- a special syntax that builds sub-tables, but reference it as if it were a single table. There are very few cases where performance is enhanced by such. We need to see your queries to discuss this further.
    • Manually splitting a table into table_001, etc -- Again and again, this is suggested on this forum. It is always shot down as a bad idea.
    • "Replication" -- By having lots of Slaves hanging off the one Master, you can gain "read scaling". This would allow more queries at the same time. (I don't see this as being the 'right' solution for you.) In Replication, you must write to one place, and you get many identical copies of the data.

    Will you be removing "old" data? If so, PARTITION BY RANGE(TO_DAYS(..)) makes purging much more efficient. But it does not necessarily improve SELECT performance; let's see your SELECTs.

    Will you be "summarizing" the data, a la Data Warehousing "reports"? If so, we can discuss this further.

    re: INSERTion rate

    With InnoDB, a single big table is essentially as easy to INSERT into as multiple small tables or multiple partitions. 200 is not a very big insertion rate.

    • Do you "batch" the inserts? That is, INSERT ... VALUES (...), (...), ...;? Or use LOAD DATA? Either of those is likely to sustain over 1000/sec.
    • Do you have lots of INDEXes? They are somewhat a burden on high insertion rate. Let's see SHOW CREATE TABLE so we can discuss further.
    • Do you have multiple threads doing INSERTs?
    • For really high insertion rates, see this .
    • SSDs allow even higher insertion rates.