We have a mysql tracking database which has ~100M rows. We generally run queries group by day for certain action, unique visits, etc. The problem is that queries are getting slow to generate the monthly report because of the interplay of the indexes (we have a range scan for the date then queries based on multiple other fields.)
To improve performance, we've switched to doing unions based on the date to avoid the range scan and the performance is much better. So the idea was presented that perhaps we should just shard by day with a different table for each day. The advantages seem to be:
Has anyone heard of or tried this approach? Are there any foreseeable problems?
Note: We are considering other noSQL approaches - but we would like to know if this is a valid approach if we decide to stay with MySQL (so please don't suggest "try XYZ noSQL DB"). Also, I understand we can just get a much better machine and that in the scope of data sets, this is not that big - but we don't want to waste money on a larger machine if a smaller will work without a lot of extra work.
It seems you may take a look at MySQL Partitioning.
Partitioning enable you to distribute portions of individual tables across a file system according to rules which you can set largely as needed. In effect, different portions of a table are stored as separate tables in different locations. The user-selected rule by which the division of data is accomplished is known as a partitioning function, which in MySQL can be the modulus, simple matching against a set of ranges or value lists, an internal hashing function, or a linear hashing function. The function is selected according to the partitioning type specified by the user, and takes as its parameter the value of a user-supplied expression. This expression can be a column value, a function acting on one or more column values, or a set of one or more column values, depending on the type of partitioning that is used.
It seems a in your case a hash partition on month part of the date could be useful.
CREATE TABLE Mydata (id INT, amount DECIMAL(7,2), tr_date DATE)
ENGINE=INNODB
PARTITION BY HASH( MONTH(tr_date) )
PARTITIONS 12;
The partitioning solution will be conceptually the same as yours, but RDBMS will handle many aspects for you.