We have 2 tables - overview and details. There could be millions of rows in "overview" and each row of "overview" can have millions of rows associated with it in "details". The foreign key details.overview_id refers to overview.id. Most queries are of the general formSELECT * FROM details WHERE overview_id = xxx AND details.id > yyy AND details.id < zzz;
If we have a single table for details, the queries will be too slow (although the queries on details are almost always on primary keys).
More on the nature of DB activities: INSERT and UPDATE on overview happens infrequently. INSERT on details happen at a rapid pace, while UPDATE on the same table almost never happens and bulk DELETE happens sometimes.
In the past we used raw SQL to partition the table "details" against each row in "overview". (In practice, we did not actually partition, instead we created new tables based on a template. These tables did not have any column called overview_id (saving storage space), instead we had a separate table that did the mapping between overview.id and the table-name of the specific partition table.) So, as you can understand, the partitions had to be generated on the fly as new rows were inserted in overview and partitions were dropped as rows were deleted from overview. All of this was managed inside the application. The application-database interaction has been blazing fast, but the application code is fairly complex, implying it is hard to maintain. Also, with raw SQL lying around everywhere, it is hard to scale the DB horizontally - we have to reinvent what most JPA providers have already done.
Currently we are exploring options for a mechanism by which this partitioning can happen behind the scene - possibly by a JPA provider (I understand that this is not part of the JPA spec), so that we can focus on the application while the underlying framework/layer takes care of the scalability issues.
I looked at openJPA Slice and EclipseLink. Both of them provide partition (shard) management across hosts. We certainly need that. But we also need partition management within a single host. However, if there is a better or more elegant solution to this or if there is a totally different angle to look at this, I will be really glad to know about that.
I will appreciate any insight you can provide.
Thanks.
Prajesh
Thank you all for your comments/answers till date. We decided to stick to what we already have (see the section named "what we already have"), with minor modifications.