Search code examples
spring-boothibernatejpalucenehibernate-search

Why does Hibernate Search app take so much time to start?


I'm working on a Spring Boot App using Hibernate Search 5.11.10 Final and Lucene. Starting the app takes up to 2 minutes:

2022-07-25 14:53:05.381  INFO 16884 --- [  restartedMain] zuers.Application          : Starting Application using Java 11.0.14.1 on CVDE043 with PID 16884 (C:\blablapath\target\classes started by ladybug in C:\blablapath)
2022-07-25 14:53:05.381  INFO 16884 --- [  restartedMain] zuers.Application          : No active profile set, falling back to default profiles: default
2022-07-25 14:53:05.426  INFO 16884 --- [  restartedMain] .e.DevToolsPropertyDefaultsPostProcessor : Devtools property defaults active! Set 'spring.devtools.add-properties' to 'false' to disable
2022-07-25 14:53:05.426  INFO 16884 --- [  restartedMain] .e.DevToolsPropertyDefaultsPostProcessor : For additional web related logging consider setting the 'logging.level.web' property to 'DEBUG'
2022-07-25 14:53:06.166  INFO 16884 --- [  restartedMain] .s.d.r.c.RepositoryConfigurationDelegate : Bootstrapping Spring Data JPA repositories in DEFAULT mode.
2022-07-25 14:53:06.224  INFO 16884 --- [  restartedMain] .s.d.r.c.RepositoryConfigurationDelegate : Finished Spring Data repository scanning in 49 ms. Found 1 JPA repository interfaces.
2022-07-25 14:53:06.795  INFO 16884 --- [  restartedMain] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat initialized with port(s): 8081 (http)
2022-07-25 14:53:06.805  INFO 16884 --- [  restartedMain] o.apache.catalina.core.StandardService   : Starting service [Tomcat]
2022-07-25 14:53:06.806  INFO 16884 --- [  restartedMain] org.apache.catalina.core.StandardEngine  : Starting Servlet engine: [Apache Tomcat/9.0.54]
2022-07-25 14:53:06.900  INFO 16884 --- [  restartedMain] o.a.c.c.C.[Tomcat].[localhost].[/]       : Initializing Spring embedded WebApplicationContext
2022-07-25 14:53:06.901  INFO 16884 --- [  restartedMain] w.s.c.ServletWebServerApplicationContext : Root WebApplicationContext: initialization completed in 1474 ms
2022-07-25 14:53:06.927  INFO 16884 --- [  restartedMain] com.zaxxer.hikari.HikariDataSource       : HikariPool-1 - Starting...
2022-07-25 14:53:07.213  INFO 16884 --- [  restartedMain] com.zaxxer.hikari.HikariDataSource       : HikariPool-1 - Start completed.
2022-07-25 14:53:07.220  INFO 16884 --- [  restartedMain] o.s.b.a.h2.H2ConsoleAutoConfiguration    : H2 console available at '/h2-console'. Database available at 'jdbc:h2:~/zuers-db-komplett'
2022-07-25 14:53:07.400  INFO 16884 --- [  restartedMain] o.hibernate.jpa.internal.util.LogHelper  : HHH000204: Processing PersistenceUnitInfo [name: default]
2022-07-25 14:53:07.488  INFO 16884 --- [  restartedMain] org.hibernate.Version                    : HHH000412: Hibernate ORM core version 5.4.32.Final
2022-07-25 14:53:07.653  INFO 16884 --- [  restartedMain] o.hibernate.annotations.common.Version   : HCANN000001: Hibernate Commons Annotations {5.1.2.Final}
2022-07-25 14:53:07.870  INFO 16884 --- [  restartedMain] org.hibernate.dialect.Dialect            : HHH000400: Using dialect: org.hibernate.dialect.H2Dialect
2022-07-25 14:53:08.307  INFO 16884 --- [  restartedMain] org.hibernate.search.engine.Version      : HSEARCH000034: Hibernate Search 5.11.10.Final
2022-07-25 14:54:59.745  INFO 16884 --- [  restartedMain] o.h.e.t.j.p.i.JtaPlatformInitiator       : HHH000490: Using JtaPlatform implementation: [org.hibernate.engine.transaction.jta.platform.internal.NoJtaPlatform]
2022-07-25 14:55:00.155  INFO 16884 --- [  restartedMain] j.LocalContainerEntityManagerFactoryBean : Initialized JPA EntityManagerFactory for persistence unit 'default'
2022-07-25 14:55:00.838  WARN 16884 --- [  restartedMain] JpaBaseConfiguration$JpaWebConfiguration : spring.jpa.open-in-view is enabled by default. Therefore, database queries may be performed during view rendering. Explicitly configure spring.jpa.open-in-view to disable this warning
2022-07-25 14:55:01.424  WARN 16884 --- [  restartedMain] ion$DefaultTemplateResolverConfiguration : Cannot find template location: classpath:/templates/ (please add some templates or check your Thymeleaf configuration)
2022-07-25 14:55:01.567  INFO 16884 --- [  restartedMain] o.s.b.d.a.OptionalLiveReloadServer       : LiveReload server is running on port 35729
2022-07-25 14:55:01.605  INFO 16884 --- [  restartedMain] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat started on port(s): 8081 (http) with context path ''
2022-07-25 14:55:01.614  INFO 16884 --- [  restartedMain] zuers.Application          : Started Application in 116.603 seconds (JVM running for 117.676)

My question is: What is happening between HSEARCH000034: Hibernate Search 5.11.10.Final and HHH000490: Using JtaPlatform implementation: [org.hibernate.engine.transaction.jta.platform.internal.NoJtaPlatform]? And is there any way this startup can go by faster?

Here is my applications.properties:

server.port=8081
spring.jpa.hibernate.ddl-auto=update
spring.h2.console.enabled=true
spring.datasource.url=jdbc:h2:~/zuers-db-komplett;AUTO_SERVER=TRUE
spring.jpa.properties.hibernate.search.lucene_version LATEST
spring.jpa.properties.hibernate.search.default.directory_provider = filesystem
spring.jpa.properties.hibernate.search.default.indexBase = ./var/lucene/indexes
spring.jpa.properties.hibernate.search.indexing_strategy = manual
spring.jpa.properties.hibernate.search.lucene.analysis_definition_provider = zuers.analyzer.AnalyzerProvider
spring.jpa.properties.hibernate.search.analyzer= strasseAnalyzer

Note: the database I'm working with has a size of 4GB


Solution

  • During application startup, the Lucene indexer will create or update an index at the filesystem location:

    /var/lucene/indexes

    Basically two effects might be the cause for the observed loss of time:

    • (a) the size of the data source, ie number of tuples and/or attributes to be processed, or
    • (b) the IO throughput of the filesystem you are reading from or writing to.

    Consider checking for IO bottlenecks (b), and if that can be ruled out that you don’t re-create the index with (massive amounts of) data (a) every time the application starts. As the documentation warns, keep in mind that

    reindexing is a potentially long-running task that should be triggered explicitly. To populate indexes with pre-existing data from the database, use mass indexing.

    and that

    dropping the schema means losing indexed data.

    Most likely, your custom or manual indexing strategy does one of the above: every single time the application starts.

    Note: The hint in the other answer by @yrodiere should guide you to rule this option out.