Search code examples
mysqldatabaseinnodbdatabase-partitioning

MySQL - clarification on table partition / subpartition


I've a table with around 300 millions records in it (only for 60 days). The table structure is quite simple ..
It has 12 columns, which 4 of them are Indexed:

LogTime - DateTime
RegionID - Integer
ProvinceID - Integer
CityID - Integer

There is also a Unique Index:

UNIQUE INDEX UI_IDX (`LogTime`, `RegionID`, `ProvinceID`, `CityID`); 

The rest of fields are just values that mostly being fetched by query results.

The database-server is already tuned with good condition (~12GB for buffer-space and etc)

Well, For speeding up the queries I decided to partition the table on LogTime field. I achieved this goal by partitioning the table with PARTITION BY RANGE. Now I've 365 partitions for the table (1 partition for 1 day , 5 millions records for each partition).

The thing is we're not satisfied with query performance, and I was looking for a way to break-down the partitions to smaller pieces .. lets say partition a table on LogTime and RegionID ... I'm not sure if it is applicable in Sub-partition ...


Solution

  • For

    select  b.CityName, a.val1, a.val2
        from  DataRepo a
        left join  City b on a.CityID = b.CityID
        where  (a.LogTime >= '2015-08-01 00:00:00'
           and  a.LogTime <= '2015-08-05 00:00:00' )
          and  a.RegionID = 1
          and  a.ProvinceID = 14;
    

    The optimal index is

    INDEX(RegionID, ProvinceID, -- in either order
          LogTime)    -- last
    

    No PARTITIONing will help speed up the query above what that index can deliver.

    (Unrelated) Did you deliberately want to check 4 days plus one second?

    I like this design pattern:

        where  (a.LogTime >= '2015-08-01'
           and  a.LogTime <  '2015-08-01' + INTERVAL 4 DAY )
    

    Oh... This index would be even faster, because it is "covering":

    INDEX(RegionID, ProvinceID, -- in either order
          LogTime,    -- range
          CityID)     -- the rest of the fields used in the SELECT
    

    With that, the query will be performed entirely (aside from the JOIN) in the index, without touching the data. EXPLAIN will say "Using index".

    (Another aside) Is Region a subset of Province? Or vice versa? If so, you don't need both columns in the Fact table, do you? (Shrinking the Fact table is another technique for improving performance, since you are doubtless I/O-bound.)

    (More) Is Province INT? That's 4 bytes. If there are under 255 Provinces, use TINYINT UNSIGNED, which is only 1 byte. That will save nearly 1GB in the table, plus another GB in the index I proposed.