I've a table with around 300 millions records in it (only for 60 days). The table structure is quite simple ..
It has 12 columns, which 4 of them are Indexed:
LogTime - DateTime
RegionID - Integer
ProvinceID - Integer
CityID - Integer
There is also a Unique Index:
UNIQUE INDEX UI_IDX (`LogTime`, `RegionID`, `ProvinceID`, `CityID`);
The rest of fields are just values that mostly being fetched by query results.
The database-server is already tuned with good condition (~12GB for buffer-space and etc)
Well, For speeding up the queries I decided to partition the table on LogTime
field. I achieved this goal by partitioning the table with PARTITION BY RANGE
. Now I've 365 partitions for the table (1 partition for 1 day , 5 millions records for each partition).
The thing is we're not satisfied with query performance, and I was looking for a way to break-down the partitions to smaller pieces .. lets say partition a table on LogTime and RegionID ... I'm not sure if it is applicable in Sub-partition ...
For
select b.CityName, a.val1, a.val2
from DataRepo a
left join City b on a.CityID = b.CityID
where (a.LogTime >= '2015-08-01 00:00:00'
and a.LogTime <= '2015-08-05 00:00:00' )
and a.RegionID = 1
and a.ProvinceID = 14;
The optimal index is
INDEX(RegionID, ProvinceID, -- in either order
LogTime) -- last
No PARTITIONing
will help speed up the query above what that index can deliver.
(Unrelated) Did you deliberately want to check 4 days plus one second?
I like this design pattern:
where (a.LogTime >= '2015-08-01'
and a.LogTime < '2015-08-01' + INTERVAL 4 DAY )
Oh... This index would be even faster, because it is "covering":
INDEX(RegionID, ProvinceID, -- in either order
LogTime, -- range
CityID) -- the rest of the fields used in the SELECT
With that, the query will be performed entirely (aside from the JOIN
) in the index, without touching the data. EXPLAIN
will say "Using index".
(Another aside) Is Region a subset of Province? Or vice versa? If so, you don't need both columns in the Fact table, do you? (Shrinking the Fact table is another technique for improving performance, since you are doubtless I/O-bound.)
(More) Is Province INT
? That's 4 bytes. If there are under 255 Provinces, use TINYINT UNSIGNED
, which is only 1 byte. That will save nearly 1GB in the table, plus another GB in the index I proposed.