Search code examples
javaspring-boothibernatejpaentitymanager

Entity Manager doesn't flush all entities


I have the following structure of my test application:

Entity:

@Data
@Entity
@Table(name = "content")
public class Content {
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private int id;
    private String text;
}

Repository:

@Repository
public class ContentRepositoryCustomImpl implements ContentRepositoryCustom {
    @PersistenceContext
    private EntityManager entityManager;

    @Override
    public void updateLastChar(int batchSize, String lastChar) {
        TypedQuery<Long> countQuery = entityManager.createQuery("Select count(c.id) from Content c", Long.class);
        long count = countQuery.getSingleResult();

        TypedQuery<Content> query = entityManager.createQuery("Select c From Content c", Content.class);

        int pageNumber = 1;
        while (pageNumber < count) {
            query.setFirstResult(pageNumber - 1);
            query.setMaxResults(batchSize);

            List<Content> contents = query.getResultList();

            for (Content content : contents) {
                String temp = content.getText().substring(1);
                content.setText(temp + lastChar);
            }

            pageNumber += batchSize;
        }
    }
}

Test:

@ExtendWith(SpringExtension.class)
@SpringBootTest
class ContentRepositoryTest {

    @Autowired
    ContentRepository contentRepository;

    @Test
    @Transactional
    @Commit
    public void testOOMError2() throws InterruptedException {
        contentRepository.updateLastChar(30, "5");
    }
}

application.yml:

spring:
  datasource:
    username: user
    password: pass
    url: jdbc:postgresql://localhost:5432/pg?currentSchema=myschema&characterEncoding=UTF-8&serverTimezone=UTC
    driverClassName: org.postgresql.Driver
    type: com.zaxxer.hikari.HikariDataSource
    hikari:
      maximumPoolSize: 20
  jpa:
    properties:
      hibernate:
        hbm2ddl:
          auto: none
        dialect: org.hibernate.dialect.PostgreSQL9Dialect
        generate_statistics: true
    hibernate:
      naming:
        physical-strategy: org.hibernate.boot.model.naming.PhysicalNamingStrategyStandardImpl

DB:

CREATE TABLE etl.content
(
    id integer NOT NULL DEFAULT nextval('etl.content_seq'::regclass),
    text character varying(1048512) COLLATE pg_catalog."default",
    CONSTRAINT content_pkey PRIMARY KEY (id)
)

In the DB I have 100 records. The 'text' column is full (1048512 chars) for every row.

The statistics which I get after the test can change from time to time.

When all records were updated:

Session Metrics {
    1365887 nanoseconds spent acquiring 1 JDBC connections;
    0 nanoseconds spent releasing 0 JDBC connections;
    9047519 nanoseconds spent preparing 104 JDBC statements;
    10297157763 nanoseconds spent executing 104 JDBC statements;
    0 nanoseconds spent executing 0 JDBC batches;
    0 nanoseconds spent performing 0 L2C puts;
    0 nanoseconds spent performing 0 L2C hits;
    0 nanoseconds spent performing 0 L2C misses;
    875610249 nanoseconds spent executing 1 flushes (flushing a total of 100 entities and 0 collections);
    8352008089 nanoseconds spent executing 5 partial-flushes (flushing a total of 180 entities and 180 collections)

When some records were updated (79 in the case below):

Session Metrics {
    1566421 nanoseconds spent acquiring 1 JDBC connections;
    0 nanoseconds spent releasing 0 JDBC connections;
    8680213 nanoseconds spent preparing 104 JDBC statements;
    10392348834 nanoseconds spent executing 104 JDBC statements;
    0 nanoseconds spent executing 0 JDBC batches;
    0 nanoseconds spent performing 0 L2C puts;
    0 nanoseconds spent performing 0 L2C hits;
    0 nanoseconds spent performing 0 L2C misses;
    735904121 nanoseconds spent executing 1 flushes (flushing a total of 79 entities and 0 collections);
    8773113348 nanoseconds spent executing 5 partial-flushes (flushing a total of 158 entities and 158 collections)

I try to figure out why it is happening. When I change the flush mode to COMMIT everything works:

spring.jpa.properties.org.hibernate.flushMode=ALWAYS

However, it would be nice to understand the cause of the problem.

UPDATE 1:

I removed

entityManager.persist(content);

as Simon Martinelli suggested, but it still does not work(69 entities were updated instead of 100):

Session Metrics {
    11661208 nanoseconds spent acquiring 1 JDBC connections;
    0 nanoseconds spent releasing 0 JDBC connections;
    18982347 nanoseconds spent preparing 105 JDBC statements;
    8964494681 nanoseconds spent executing 105 JDBC statements;
    0 nanoseconds spent executing 0 JDBC batches;
    0 nanoseconds spent performing 0 L2C puts;
    0 nanoseconds spent performing 0 L2C hits;
    0 nanoseconds spent performing 0 L2C misses;
    774310803 nanoseconds spent executing 1 flushes (flushing a total of 69 entities and 0 collections);
    7321581493 nanoseconds spent executing 5 partial-flushes (flushing a total of 138 entities and 138 collections)

UPDATE 2:

I added entityManager.flush():

public void updateLastChar(int batchSize, String lastChar) {
        TypedQuery<Long> countQuery = entityManager.createQuery("Select count(c.id) from Content c", Long.class);
        long count = countQuery.getSingleResult();

        TypedQuery<Content> query = entityManager.createQuery("Select c From Content c", Content.class);

        int pageNumber = 1;
        while (pageNumber < count) {
            query.setFirstResult(pageNumber - 1);
            query.setMaxResults(batchSize);

            List<Content> contents = query.getResultList();

            for (Content content : contents) {
                content.setText(lastChar);
            }
            entityManager.flush();

            pageNumber += batchSize;
        }
    }

But the problem is still here. Moreover, I have less (65) than 105 statements (this behavior also occurs from time to time):

Session Metrics {
    8190152 nanoseconds spent acquiring 1 JDBC connections;
    0 nanoseconds spent releasing 0 JDBC connections;
    5825900 nanoseconds spent preparing 65 JDBC statements;
    39315983 nanoseconds spent executing 65 JDBC statements;
    0 nanoseconds spent executing 0 JDBC batches;
    0 nanoseconds spent performing 0 L2C puts;
    0 nanoseconds spent performing 0 L2C hits;
    0 nanoseconds spent performing 0 L2C misses;
    57656774 nanoseconds spent executing 5 flushes (flushing a total of 249 entities and 0 collections);
    3436917 nanoseconds spent executing 5 partial-flushes (flushing a total of 129 entities and 129 collections)

Solution

  • So after a couple of hours of suffering, I've finally found the reason: I don't use the ORDER BY clause and that`s why getResultList() sometimes returns duplicates.

    The fix:

    TypedQuery<Content> query = entityManager.createQuery("Select c From Content c ORDER BY c.id asc", Content.class);