java spring-boot performance hibernate jpa

How to avoid the N +1 SELECT Problem for nested entities with Hibernate

I've been developing a Spring REST API using Spring Data JPA. I learned that you can easily run into the N + 1 SELECT problem which basically causes a SELECT for your parent entities and then N SELECTs for each child entity of that parent which is a performance problem.

Furthermore, I noticed that entity graphs seem like a good way of specifying which entities should be fetched in combination with the parent entity in usually a single query or much less queries (therefore avoiding the N + 1 problem). However, the problem I have is that my entities are deeply nested and the query for my parent entity should avoid the N + 1 problem for all child entities.

Assuming I'm modeling my entities like a card game between 2 players as follows:

@Entity
@Getter
@Setter
@Builder
@NoArgsConstructor
@AllArgsConstructor
public class Duel
{
    @Id
    @JsonIgnore
    @GeneratedValue(strategy = SEQUENCE, generator = "seq_duel_generator")
    @SequenceGenerator(name = "seq_duel_generator", sequenceName = "seq_duel")
    private Long id;

    @OneToOne(cascade = CascadeType.PERSIST, orphanRemoval = true)
    private Player firstPlayer;

    @OneToOne(cascade = CascadeType.PERSIST, orphanRemoval = true)
    private Player secondPlayer;

    // ...
}

@Entity
@Getter
@Setter
@Builder
@NoArgsConstructor
@AllArgsConstructor
public class Player
{
    @Id
    @JsonIgnore
    @GeneratedValue(strategy = SEQUENCE, generator = "seq_player_generator")
    @SequenceGenerator(name = "seq_player_generator", sequenceName = "seq_player")
    private Long id;

    @OneToOne(cascade = {CascadeType.PERSIST, CascadeType.REMOVE})
    @JoinColumn(name = "deck_id")
    private Deck deck;

    // ...
}

@Entity
@Getter
@Setter
@Builder
@NoArgsConstructor
@AllArgsConstructor
public class Deck
{
    @Id
    @JsonIgnore
    @GeneratedValue(strategy = SEQUENCE, generator = "seq_deck_generator")
    @SequenceGenerator(name = "seq_deck_generator", sequenceName = "seq_deck")
    private Long id;

    @LazyCollection(LazyCollectionOption.FALSE)
    @OneToMany(cascade = CascadeType.PERSIST, orphanRemoval = true)
    private List<DeckCard> mainDeck;

    @LazyCollection(LazyCollectionOption.FALSE)
    @OneToMany(cascade = CascadeType.PERSIST, orphanRemoval = true)
    private List<DeckCard> extraDeck;

    @LazyCollection(LazyCollectionOption.FALSE)
    @JsonIgnore
    @OneToMany(cascade = CascadeType.PERSIST, orphanRemoval = true)
    private List<DeckCard> sideDeck;

    @OneToOne(mappedBy = "deck", cascade = CascadeType.PERSIST)
    @JsonIgnore
    private Player player;
}

// ...

The query I want to optimize has the following signature:
List<Duel> findAllByGameModeAndSeasonId(final GameMode gameMode, final int seasonId);

I tried defining a named entity graph by decorating the class as follows:

@NamedEntityGraph(name = "Duel.withPlayers",
        attributeNodes = {
                @NamedAttributeNode(value = "firstPlayer"),
                @NamedAttributeNode(value = "secondPlayer")
        }
)

Then I decorated the repository method as follows:
@EntityGraph(value = "Duel.withPlayers", type = EntityGraph.EntityGraphType.LOAD)

In this case, that didn't make a difference between using the entity graph and not using the entity graph, the generated SQL by Hibernate kept querying statements such as:

Query:["select s1_0.deck_id,s1_1.id,c1_0.id,a1_0.id,a1_0.name,c1_0.card_id,c1_0.name,s1_1.rarity from deck_side_deck s1_0 join deck_card s1_1 on s1_1.id=s1_0.side_deck_id left join card c1_0 on c1_0.id=s1_1.card_id left join archetype a1_0 on a1_0.id=c1_0.archetype_id where s1_0.deck_id=?"]
Query:["select m1_0.deck_id,m1_1.id,c1_0.id,a1_0.id,a1_0.name,c1_0.card_id,c1_0.name,m1_1.rarity from deck_main_deck m1_0 join deck_card m1_1 on m1_1.id=m1_0.main_deck_id left join card c1_0 on c1_0.id=m1_1.card_id left join archetype a1_0 on a1_0.id=c1_0.archetype_id where m1_0.deck_id=?"]
Params:[(20)]
Query:["select e1_0.deck_id,e1_1.id,c1_0.id,a1_0.id,a1_0.name,c1_0.card_id,c1_0.name,e1_1.rarity from deck_extra_deck e1_0 join deck_card e1_1 on e1_1.id=e1_0.extra_deck_id left join card c1_0 on c1_0.id=e1_1.card_id left join archetype a1_0 on a1_0.id=c1_0.archetype_id where e1_0.deck_id=?"]
Params:[(20)]

Now it seems like I need to specify sub graphs but nope, that also didn't help:

@NamedEntityGraph(name = "Duel.withPlayers",
        attributeNodes = {
                @NamedAttributeNode(value = "firstPlayer", subgraph = "playerGraph"),
                @NamedAttributeNode(value = "secondPlayer", subgraph = "playerGraph")
        },
        subgraphs = {
                @NamedSubgraph(name = "playerGraph",
                        attributeNodes = {
                                @NamedAttributeNode(value = "deck", subgraph = "deckGraph")
                        }
                ),
                @NamedSubgraph(name = "deckGraph",
                        attributeNodes = {
                                @NamedAttributeNode(value = "mainDeck"),
                                @NamedAttributeNode(value = "extraDeck"),
                                @NamedAttributeNode(value = "sideDeck")
                        }
                )
        }
)

The thrown error message is: org.hibernate.loader.MultipleBagFetchException: cannot simultaneously fetch multiple bags: [Deck.extraDeck, Deck.mainDeck]. Even the @LazyCollection annotations didn't fix that error. I can't simply convert my Lists to Sets.

Of course, I don't want to write any custom SQL if I'm already using Hibernate. Does it help to define entity graphs over each entity to force left joins? What can I do to optimize this query and maybe also avoid situations where my performance would be strongly affected when working with my Hibernate entities? It's very important for my database to efficiently handle 1000s or more records and currently that doesn't seem to be the case...

Solution

IMO, the importance of N + 1 problem is overestimated, especially in case when we need to fetch associated collections. In your particular case you believe that it would be "more optimal" to fetch DeckCard entities together with Deck entity using single SQL-query instead of issuing three different SQL-queries, let's take a look what is really going on...

The single SQL-query should look like:

select deck.*, mainDeck.*, extraDeck.*, sideDeck.*
from deck
left join DeckCard mainDeck on (mainDeck.deck_id=deck.id)
left join DeckCard extraDeck on (extraDeck.deck_id=deck.id)
left join DeckCard sideDeck on (sideDeck.deck_id=deck.id)

and the questions is: How many rows does that query return? If there are 54 cards in a deck, we may expect the query mentioned above returns 157464 rows (since every extra join of DeckCard multiplies rows in resultset we expect to get 54*54*54=157464 rows). So, instead of issuing three very simple queries, each of them returns just 54 rows (162 rows in total), the alternative is to issue single monster query and after that somehow process (remove duplicates, construct entities, etc) 157464 rows. Now, are you really sure the second option is better from performance perspective?

You need to pay attention to batch fetching option