I need to ensure best performance for a table with 15M+ rows in a MySQL database hosted in AWS using Aurora (Small sized instance currently). The table is essentially for tracking the ownership and update timestamp of product units over time, along with each unit's other basic information like serial number.
The columns are as follows:
UnitId, ScanTime, Model, SerialNumber, MfrTimestamp, UpdateTimestamp, CustomerId
CREATE TABLE `UnitHistory` (
`UnitId` bigint(20) NOT NULL,
`ScanTime` int(11) NOT NULL,
`Model` bigint(20) NOT NULL,
`SerialNumber` int(11) NOT NULL,
`MfrTimestamp` int(11) NOT NULL,
`UpdateTimestamp` int(11) DEFAULT NULL,
`CustomerId` bigint(20) DEFAULT NULL,
PRIMARY KEY (`UnitId`,`ScanTime`)
);
Rows will be added over time, but NEVER modified.
I chose UnitId and ScanTime as the primary key because those two together are sufficient to always be unique.
The query I'll most frequently use will ideally produce a list of all UnitId's for a specific Model, along with the unit's most up-to-date details. The following query will work, but will of course also return more rows than I need (redundant data):
SELECT UnitId, SerialNumber, MfrTimestamp, UpdateTimestamp, CustomerId FROM UnitHistory WHERE Model=2500;
If there's a way to constrain that query so that only the row with the most recent ScanTime is returned for any given UnitId, that would be ideal. Otherwise I'll simply search the result for the row with the most recent ScanTime for each UnitId afterward.
The other very frequently used query will produce a basic set of details and history for any particular unit, like this:
SELECT ScanTime, SerialNumber, MfrTimestamp, UpdateTimestamp, CustomerId FROM UnitHistory WHERE UnitId=1234567;
This query will primarily be used to track the change of ownership as it passes from the manufacturer to a customer, then back to the manufacturer for update, then out to perhaps a different customer, etc.
With the above scenario, what additional key(s) should I have in order to ensure good performance and low cost?
One cost factor is that I assume my working set should fit within RAM in order to avoid lots of IOs since AWS charges for IOs. My current database instance has 2 GB RAM, and for cost reasons I don't want to upgrade it.
For your query 1, you should have this index:
ALTER TABLE UnitHistory ADD INDEX (Model, ScanTime);
To get the most recent:
SELECT UnitId, SerialNumber, MfrTimestamp, UpdateTimestamp, CustomerId
FROM UnitHistory WHERE Model=2500
ORDER BY ScanTime DESC LIMIT 1;
Here's a demo of using EXPLAIN to confirm the query uses the index (which is named "Model" after the first column of the index since I didn't give it a name in my test):
mysql> explain SELECT UnitId, SerialNumber, MfrTimestamp, UpdateTimestamp, CustomerId FROM UnitHistory WHERE Model=2500 order by scantime desc limit 1;
+----+-------------+-------------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
| 1 | SIMPLE | UnitHistory | NULL | ref | Model | Model | 8 | const | 1 | 100.00 | Using where |
+----+-------------+-------------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
Your other query 1 is already searching by the left-most column of the primary key, so there's no need to add another index.
mysql> explain SELECT ScanTime, SerialNumber, MfrTimestamp, UpdateTimestamp, CustomerId FROM UnitHistory WHERE UnitId=1234567;
+----+-------------+-------------+------------+------+---------------+---------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------+------------+------+---------------+---------+---------+-------+------+----------+-------+
| 1 | SIMPLE | UnitHistory | NULL | ref | PRIMARY | PRIMARY | 8 | const | 1 | 100.00 | NULL |
+----+-------------+-------------+------------+------+---------------+---------+---------+-------+------+----------+-------+
I can't predict whether your working set will fit in RAM, because I don't know the distribution of your data.