Search code examples
mysqlmyisam

MySQL Partitioning, Delete old data from multiple related tables


I am new to MySQL partitioning, therefore any example will be appreciated.

I am trying to create a sort of an ageing mechanism for a data that is distributed between several MyISAM tables.

My question will actually include several sub-questions.

The relevant tables are:

  1. First table contains raw data with high input frequency (next to each record there is an auto incremented id).
  2. Second table contains processed results, there is a result record per every raw data record (result record contains the source id record of the auto incremented field of raw data record)

Questions:

  1. I need to be able to partition the raw data table and result data table similarly so that both of them will include only 10 weeks of data in single partition (each raw data record contains unixtimestamp field), how do i do it , can someone write small example case for two such tables?.
  2. I want to be able to change the 10 weeks constraint on the fly.
  3. I want that when ever the current partition will be filled or a new partition is created , the previous (10 weeks before) partition will be deleted automatically.
  4. I don't want the auto increment id integer to be overflown, as much as i understand the ids are unique for the partition only, so if i am not wrong the auto increment id will start from zero for the next partition? but what if the previous partition still exist, will i have 2 duplicated ids , how i know to reference only for the last id when i present a result record?
  5. I want to load raw data using LOAD DATA INTO... instead of multiple inserts , is MySQL partitioning functionality affected?

And the last question, would you suggest some other approach to implement aging mechanism (i am writing Java implementation product that processes around 1 GB or raw data per day and stores the results in MySQL)


Solution

  • It's hard to give a real answer on this question since it depends on your data. But let me give you some things to think about.

    1. I assume we're talking about some kind of logs with recent data (so not spanning multiple years). You can partition by range. You could add one field to your table with the year/week number (ie 201201, 201202, etc). If this question is related to your question about importing into multiple tables, you can easily do this is that import script.

    2. On the fly as in, repartition your data on the fly (70GB?). I would not recommend it. But you could do it if you had the weeknumber in there. If you later want to change it to 12 days, you could add a column for the date and partition by that.

    3. Well it won't be deleted automatically but a cron job can handle that right? Just check how many partitions there are, and if there are 3(?) delete the first one.

    4. The partition needs to have a primary index on the field that you partition (if you want to use auto increment). Therefor you can never fully rely on the auto increment id alone. I don't see a way around this.

    5. I'm not sure what you mean.

    If your data is just some logs in chronological order then you might just use separate tables for each period. Then before you start the new period (at 00:00) check the last id of the last table, create a new table and set the auto increment to that value +1. Then your import will decide when a new period will begin so it can be easily changed. Your import script can use a small table in where it can store the next period.