This scenario uses a simple oneToMany relationship with cascade persist on both directions.
Many:
@javax.persistence.Entity(name="Many")
public class Many {
@javax.persistence.ManyToOne(cascade = CascadeType.PERSIST)
protected One one;
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private long primaryKey;
public void setM(One one) {
this.one = one;
// comment out this line and performance becomes stable
this.one.getMany().add(this);
}
// other setters, getters, etc...
}
One:
@javax.persistence.Entity(name="One")
public class One {
@javax.persistence.OneToMany(mappedBy="m", cascade = CascadeType.PERSIST)
protected java.util.Set<Many> many = com.google.common.collect.Sets.newHashSet();
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private long primaryKey;
private String name;
// setters, getters, etc...
}
Test:
public static void main(String[] args) {
while(true) {
EntityManagerFactory emf = Persistence.createEntityManagerFactory("test-pu");
EntityManager em = emf.createEntityManager();
for (int i = 0; i < 100; i++) {
sw.reset();
sw.start();
persistMVs(emf, em);
System.err.println("Elapsed: " + sw.elapsed(TimeUnit.MILLISECONDS) + " ms");
}
em.close();
emf.close();
}
}
private static void persistMVs(EntityManagerFactory emf, EntityManager em) {
em.getTransaction().begin();
One one = getOrCreateOne(em);
for (int i = 0; i < 200; i++) {
Many many = new Many();
many.setM(one);
em.persist(many);
}
em.getTransaction().commit();
}
The test is an endless loop which tries to insert 20000 Many
entities associated with a single One
entity. Each loop begins with the creation of a new EntityManagerFactory
to show the negative performance effect of the increasing database.
The expected behavior would be that, the insertion time of the entities does not increase drastically, however after each WHILE CYCLE there is an order of magnitude increase.
Notes:
em.persist(many);
(I measured it).gradle start
.Why would the initial size of the database matter in this case? Should I consider this behavior as a bug?
Just to expand on Predrag's answer - traversing a 1:M relationship not only has the cost of bringing in the entities and any expands the object graph, but those entities remain managed within the persistent unit. Because your test is reusing the same EntityManager for repeated transactions, the cache of managed entities continues to grow with each iteration. This cache of managed entities must be traversed and checked for changes every time the context is synchronized with the database - this occurs on flush, transaction commit or even queries.
If you must bring in large object graphs, what can be done to mitigate this is either release and obtain new EntityManagers for each transactional boundary, or occasionally flush and clear the EntityManager. Either option allows it to release some of the managed entities, so it does not need to check them all for changes on each commit.
Edit> Your "Many" class has overriden the hashCode method and is building its hashcode using the hashcode of its referenced "One" with its primary key. This causes each and every "Many" you persist in your loops to have the same hashcode, as GenerationType.IDENTITY can only assign sequences when the insert statement occurs - which happens during synchronization (flush/commit). This method might be causing cache lookups, which occur while the provider traverses the growing object model on each persist call due to the cascade persist call, to take longer and longer.