Search code examples
mysqldatabasedatabase-designdata-warehouse

Best way to archive MySQL tables data (huge data)


So Here is the story .

Table pattern

item ` `on_date` `value1` `value2` `value3` `value4` ... `value10`   

All the values are important parameters in my table which are stored every day for each item. Every day I capture data for 100,000 rows , which in turn becomes approx 3,000,000 rows in a month .

The table is fine as the indexing and keys are created to handle the data and I have no issues because I keep deleting the data after 30 days.

Now , I would like to store the data that I delete in a MySQL table which would be purely used for historical data analysis.

So, I tried a technique to concatenate all the columns into 1 column and store in a table with pipe seperated | .

item | on_date |value1 | value2 .....| value10  

But this din't solve the problem and the data was huge.

I browsed through the community and MySQL documentation with nothing much of help . Kindly suggest me what could be the best pattern or technology to solve my problem .


Solution

  • It is unclear what you are asking.

    "Archive" means to copy the data somewhere else, then delete it from here.

    "Backup" means to copy the data elsewhere, but keep it here.

    You say you want to "store the data", then "delete" it and use it for "historical...". If it is deleted, how can you use it? Maybe you mean that you have "archived" it out of the main table into another table?

    Case 1: Backup daily, archive or delete after 30 days. See Partitioning and Backup.

    Case 2: Keeping only one day. Well, one way is to use Case 1, but do things hourly instead if daily.

    Case 3: You have two tables: The current 30 days, plus a bigger table of older data. This might involve partitioning and "transportable table spaces".

    Partitioning: See http://mysql.rjweb.org/doc.php/partitionmaint (see Cases 1 and 4)

    Backup: See mysqldump (in the manual) or Xtrabackup (Percona.com)

    Note that it is possible to do

    INSERT INTO db2.History
        SELECT * FROM db1.Current
            WHERE ...
    

    That would copy rows from one table in one database to a table in another database. However partitioning and using "transportable tablespaces" to move on partition would be a lot faster.

    (My opinion:) 3 million rows (86th percentile) in a month is "medium" sized. After a year (94th percentile), I might call it "large", but not yet "huge".

    (after Comment...)

    Plan A - zero disruption:

    Probably pt-archiver would be the core tool for removing months-old data to some other machine. (See also http://mysql.rjweb.org/doc.php/deletebig#deleting_in_chunks )

    Plan B - convert to partitioning:

    This involves a major one-time disruption to add partitioning to the table. The benefit is that carving off a month's data has virtually zero impact.

    Decide on how much time before archiving, split on weeks or months, such that you have about 20-50 partitions. Then us "transportable tablespaces" to break off the oldest partition to move to another machine. And REORGANIZE PARTITION to add a new partition each week or month. Some details are in the first link above.

    Note that you need to rethink the indexes when adding (or removing) PARTITIONing. Suggest you experiment with a large set of data to make sure you get the indexing optimal. (If you want to discuss it here, start a new Question and include the main queries, plus SHOW CREATE TABLE now and after adding PARTITION BY RANGE(..).)