Search code examples
javaspringhibernatejpajpql

How to avoid join 'explosions' in JPA with Hibernate when fields from multiple tables are needed?


Let's say I have the following method in a JpaRepository:

@EntityGraph(value = "Subject.allJoins", type = EntityGraphType.FETCH)
    @Query("select s from Subject s" + FIND_QUERY_WHERE)
    Page<Subject> findInProject(@Param("projectId") UUID projectId, <additional params>

As you can see I already use an EntityGraph with the joins I need. The resulting SQL query that Hibernate generates is as follows (with the where mostly omitted)

select
    subject0_.id,
    <all kinds of fields including some duplicates>
from
    subject subject0_
    left outer join project project1_ on subject0_.project_id = project1_.id
    left outer join subject_property_value properties2_ on subject0_.id = properties2_.subject_id
    left outer join property_value propertyva3_ on properties2_.property_value_id = propertyva3_.id
    left outer join ingestion_id_mapping ingestedme4_ on subject0_.id = ingestedme4_.subject_id
where
    subject0_.project_id = '123'
order by
    subject0_.name asc

Because of all the joins here that multiply the result for the number of rows as the result of a join, the result set explodes into hundreds of thousands of rows even if the total amount of subjects is just a few hundred.

Note that I will be making a projection and that will already avoid selecting some of the fields, but the joins are still needed.

What can I do to optimize this?

Note I do actually need all the data to immediately serialize to a client, so just leaving it to Hibernate by getting the model entity and using Getter methods for each association takes a lot longer than even this.

My current idea is that I have to do the query multiple times with the same where for each separate join and then later merge the results into a single object. It's not the end of the world if I read more or fewer rows on a subsequent query because of added or deleted rows in the original table because I can just take the smallest subset of subject ids and make a result from that.

But is there something smarter and/or simpler to do than that?


Solution

  • I'll take the example of a football Club that has a Country, a Stadium and a list of Players.

    Your first query should only be used to filter the rows that you want from the database. At this occasion, you can also fetch the 1:1 relationships, but not the 1:n. So, in my example, the 1st query should :

    • filter all the clubs matching the criteria
    • fetch all 1:1 relationship (country & stadium for each club).

    Then, you can make a dedicated for each sublist. Still in my example, you would select every players whose club is in a list you provide as parameter of the query (being the result of your first query). The query would be something like :

     String jpql = "select p from Player p where p.club in :clubs";
    

    Doing this, you can also provide entityGraph to load attributes of the players. That works well when you proceeed with pagination (the result of the 1st query will not be to important).

    This way to do is well described by Vlad Mihalcea : The best way to fix the Hibernate MultipleBagFetchException

    I deeply advise you to take a look.