Where i work, we got a dilemma. We are using a database(MariaDB 10) that has 1 table that is growing very large(107.4GiB as i write this,. so 1.181 million rows..). This does off course affect the performance of the system. Me and a coworker had a discussion, he suggested using partitions on that table. This will likely increase the performance, but does not reduce the size of the DB. In previous times, i however, have been working on writing a cronjob that will move data older then 2 years from that table to a exact copy of the database on a other location. I feel that that is the more effective way. I expect that doing this will not only increase performance(except during the times when the cronjob is running) but i know that it will also reduce the size of the table. We don't expect that our customers are interested in this old data anyway.
Question is: What would you choose? I prefer my option, because old data is not used anyway and it keeps the main DB a lot cleaner, my coworker prefers his solution because it means less load at all times and customers can still access the old data.
I have read some of the pro's to use partitioning but haven't found a comparison yet between partitioning and moving old data to another database/place
The table in question uses several query's, This is the most important insert:
INSERT INTO ".$defaultDataTable." (
sensor_data_type_id,
sequence_number,
value,
flag,
datetime
) VALUES (
'".Database::esc($sdtid)."',
'".Database::esc($valueSequence)."',
'".Database::esc($value)."',
'".Database::esc($valueSensorDataFlagsExtended)."',
'".Database::esc($valueDateTime)."'
);
The data is selected in several pages of the application, but 1 example is the following.
SELECT
ws_sensor_data_type.sensor_data_type_id as sensor_data_type_id,
ws_sensor_data_type.name as sensor_data_type_name,
ws_sensor_data_type.equation_id as equation_id,
ws_sensor.name as sensor_name,
ws_equation.description as data_type_name,
ws_basestation.network_id as network_id,
ws_basestation.name as basestation_name,
ws_basestation.worldwide_id as worldwide_id,
ws_client.name as client_name,
ws_sensor.device_type_id as device_type,
ws_sensor.device_id as device_id
FROM
ws_sensor_data_type,
ws_sensor,
ws_basestation,
ws_client_basestation,
ws_client,
ws_equation
WHERE ws_sensor.sensor_id = ws_sensor_data_type.sensor_id
AND ws_sensor.basestation_id = ws_basestation.basestation_id
AND ws_basestation.basestation_id = ws_client_basestation.basestation_id
AND ws_client_basestation.client_id = ws_client.client_id
AND ws_sensor_data_type.equation_id = ws_equation.equation_id
AND ws_sensor_data_type.sensor_data_type_id = '".Database::esc($sdtid)."'
");
In this example, the data, along with some other information is being selected to create a .CSV export file.
The create table statement will follow as i am creating a copy of the Development DB right now to test partitioning on.
We do not use UUID's so that should not be a problem.
It depends.
Partitioning does not inherently improve performance. Only a very limited number of use cases show any performance improvement. More details .
If you are only fetching "recent" rows from the table and you have adequate indexing, then "neither" is the answer -- your million rows could grow to a billion without any performance degradation.
If you are using UUIDs, you are doomed. Performance declines terribly once the data is too big to be cached.
You have done some "hand waving". So have I. If you want to continue this discussion, please provide more specifics. CREATE TABLE
, sample queries, proposed partition mechanism, proposed mechanism for accessing 'old' data, etc.