I'm writing a web application that must be multi-tenant. I'm using JPA for the persistence layer and I'm evaluating EclipseLink with interest.
The multi-tenant strategy I want to use is: one schema per customer. Hibernate supports such a strategy (http://docs.jboss.org/hibernate/orm/4.2/devguide/en-US/html/ch16.html#d5e4771) and I've already used it with success. However, AFAIK it supports it only when using the native Hibernate API, while I want to use JPA.
EclipseLink, on the other hand, supports single-table and multi-table multi-tenancy strategies. However, it also supports partitioning and with a simple custom partitioning policy I may easily set up one partition for each customer.
The first question might be whether using partitioning for this use case is appropriate or not.
The main problem, however, is that the customer base may (hopefully) grow over time, so I have to make EclipseLink "know" about new customers dynamically (i.e.: without restarting the webapp). From what I understand, to set up partitioning in EclipseLink I have to setup my persistence unit with different "connection pools" (or "nodes"): every node has its configured datasource and a name. On the other hand, the partitioning strategy will determine the node to use by its name. So far so good, but I plan to setup my persistence unit using Spring's LocalContainerEntityManagerFactoryBean
. I may discover customers dynamically on startup, when the LocalContainerEntityManagerFactoryBean
is processed, so that I can pass all the needed properties for all the nodes/customers by that time, but what happens if a new customer is added afterwards? I don't think that changing persistence unit properties dynamically will have any effect on the already constructed EntityManagerFactory
singleton instance... and I fear EclipseLink will complain if I request a partition for which no corresponding node was known at the EntityManagerFactory
creation time. Correct me if I'm wrong.
I think that declaring the LocalContainerEntityManagerFactoryBean
scope as "prototype" bean would be a very bad idea and I think it won't work at all. On the other hand, since a customer interaction is bound to a specific HTTP session, I may alternatively use a "middle" approach by declaring the LocalContainerEntityManagerFactoryBean
scope as "session", but I think that in this case I would have to manage problems like increased memory consumption and shared cache coordination between multiple EntityManagerFactories
(one for each customer that is using the application at a given time).
If I can't make this strategy work, I think I'll have to abandon partitioning as a whole and fall back to a "dynamic data source routing" approach, but in this case I'm concerned about the EclipseLink shared cache consistency (I think I'll have to disable it completely and this would be a real disadvantage).
Thanks in advance for any feedback on this.
Honestly, I didn't try Chris's suggestion, but opted for a more fine-tuned solution. This is my solution.
SecurityContextHolder
PartitioningPolicy
which determines the customer of the currently logged in user as described in the previous point, and then returns a list containing an only Accessor
that identifies that customer partitionall my tables must be partitioned and I don't want to specify that on EVERY entity with annotations, so I registered this partitioning policy into EclipseLink on startup and set it as the default one; briefly:
JpaEntityManagerFactory jpaEmf = entityManagerFactory.unwrap(JpaEntityManagerFactory.class);
ServerSession serverSession = jpaEmf.getServerSession();
serverSession.getProject().addPartitioningPolicy(myCustomerPolicy);
serverSession.setPartitioningPolicy(myCustomerPolicy);
Then, to dynamically add data sources to EclipseLink (they are called "connection pools" in EclipseLink terminology), so that the customer id specified by the policy above is matched against a known "connection pool" in EclipseLink, I do the following:
this listener queries EclipseLink to see it already knows about a connection pool identified by the user customer id; if it does, we're done, EclipseLink can correctly handle the partition; otherwise a new connection pool is created and added to EclipseLink; proof of concept:
String customerId = principal.getCustomerId();
JpaEntityManagerFactory jpaEmf = entityManagerFactory.unwrap(JpaEntityManagerFactory.class);
ServerSession serverSession = jpaEmf.getServerSession();
if (!serverSession.getConnectionPools().containsKey(customerId)) {
DataSource customerDataSource = createDataSourceForCustomer(customerId);
DatabaseLogin login = new DatabaseLogin();
login.useDataSource(customerId);
login.setConnector(new JNDIConnector(customerDataSource));
Class<? extends DatabasePlatform> databasePlatformClass = determineDbVendorPlatform(customerId);
login.usePlatform(databasePlatformClass.newInstance());
ConnectionPool connectionPool = new ExternalConnectionPool(customerId, login, serverSession);
connectionPool.startUp();
serverSession.addConnectionPool(connectionPool);
}
The user login operation is of course performed against a central data base (or any other source of authentication), so that the above code occurs before any customer-specific JPA query is executed (and hence the customer connection pool is added to EclipseLink before the partitioning policy ever reference it).
There's an important aspect to take into consideration, though. In EclipseLink, data partitioning means that an identifiable piece of data (= an entity instance) is either in just one partition, or equally replicated in multiple partitions. Entity instance identity is determined through the identifier (= primary key). This means that there should not exist two different entity instances of type E with the same id=x for two different customers/tenants T1 and T2, otherwise EclipseLink might think they are the exact same entity instance. This may lead to mixed data from different customers being read/written during a single JPA session => a disaster. Possible solutions:
The last little problem to solve to implement option 2 correctly is that, even if the EclipseLink documentation says that it's possible to specify a connection pool (= data source) dedicated to table sequencing using the eclipselink.connection-pool.sequence
configuration option, this seems to be ignored when a default partitioning policy is set as described above. In fact, my customer partitioining policy gets invoked for EVERY query, even those used for id allocations. For this reason, the policy must intercept those queries and route them to the central data source.
I couldn't find a definitive solution to this problem, but the best options I could think of are:
I opted for option 2 by correctly defining my id generation mappings as such:
@Entity
public class MyEntity {
@Id
@TableGenerator(name = "MyEntity_SEQUENCE", allocationSize = 10)
@GeneratedValue(generator = "MyEntity_SEQUENCE")
private Long id;
}
This makes EclipseLink use a table named SEQUENCE
, containing one row whose SEQ_NAME
column value is MyEntity_SEQUENCE
. The query used to update this sequence for id allocation will be named MyEntity_SEQUENCE
and we're done.
However I made my partitioining policy configurable so that I can switch from one sequence-query-identification strategy to the other at any time in case something changes in EclipseLink implementation that breaks this "heuristics".
This substantially is the whole picture. For now, it has been working well. Feedback, improvements, suggestions are welcome.