Search code examples
javahibernatejpajpql

Should I include distinct in this JPQL query?


Background

I have seen multiple answers and questions here on SO and in many popular blogs about the necessity of the distinct keyword in JPQL JOIN FETCH queries and about the PASS_DISTINCT_THROUGH query hint.

For example, see these two questions

and these blog posts

What I am missing

Now my problem is that I cannot fully understand when exactly the distinct keyword must be included in the JPQL query. More specifically, if it depends on which method is used to execute the query (getResultList or getSingleResult).

The following is an example to clarify what I mean.

Everything I am writing from now on was tested on Ubuntu Linux 18.04, with Java 8, Hibernate 5.4.13 and an in-memory H2 database (version 1.4.200).

Suppose I have a Department entity which has a lazy bidirectional one to many relationship with a DepartmentDirector entity:

// Department.java
@Entity
public class Department {
    // ...
    private Set<DepartmentDirector> directors;

    @OneToMany(mappedBy = "department", fetch = FetchType.LAZY)
    public Set<DepartmentDirector> getDirectors() {
        return directors;
    }
    // ...
}

// DepartmentDirector.java
@Entity
public class DepartmentDirector {
    // ...
    private Department department

    @ManyToOne
    @JoinColumn(name = "department_fk")
    public Department getDepartment() {
        return department;
    }
    // ...
}

Suppose that my database currently contains one department (department1) and two directors associated with it.

Now I want to retrieve the department by its uuid (the primary key), along with all its directors. This can be done with the following JOIN FETCH JPQL query:

String query = "select department from Department department left join fetch "
             + "department.directors where department.uuid = :uuid";

As the preceding query uses a join fetch with a child collection, I expected it to return two duplicated departments when issued: however this only happens when using the query with the getResultList method and not when using the getSingleResult method. This is somehow reasonable, but I have found that the Hibernate implementation of getSingleResult uses getResultList behind the curtains so I expected a NonUniqueResultException to be thrown.

I also briefly went through JPA 2.2 specification but no distinction in treating the duplicates between the two methods is mentioned, and every code sample concerning this issue uses the getResultList method.

Conclusion

In my example I found out that JOIN FETCH queries executed with getSingleResult do not suffer the duplicated entities problem explained in the resources I linked in the section Background.

If the above claim would be correct, it would mean that the same JOIN FETCH query would need distinct if executed with getResultList, but would not need it when executed with getSingleResult.

I need someone to explain me if this is expected or if I misunderstood something.


Appendix

Results of the two queries:

  1. Query ran with the getResultList method. I get two duplicated departments as expected (this was done just to test the behaviour of the query, getSingleResult should be used instead for this):

    List<Department> resultList = entityManager.createQuery(query, Department.class)
            .setParameter("uuid", department1.getUuid())
            .getResultList();
    
    assertThat(resultList).containsExactly(department1, department1); // passes
    
  2. Query ran with the getSingleResult method. I would expect the same duplicated departments to be retrieved, and thus a NonUniqueResultException to be thrown. Instead, a single department is retrieved and everything works nice:

    Department singleResult = entityManager.createQuery(query, Department.class)
            .setParameter("uuid", department1.getUuid())
            .getSingleResult();
    
    assertThat(singleResult).isEqualTo(department1); // passes
    

Solution

  • Interesting question.

    First of all let me point out that getSingleResult() was meant for queries that due to their nature always return a single result (meaning: mostly aggregate queries like SELECT SUM(e.id) FROM Entity e). A query that you think, based on some business domain-specific rule, should return a single result, does not really qualify.

    That being said, the JPA Spec states that getSingleResult() should throw NonUniqueResultException when the query returns more than one result:

    The NonUniqueResultException is thrown by the persistence provider when Query.getSingleResult or TypedQuery.getSingleResult is invoked and there is more than one result from the query. This exception will not cause the current transaction, if one is active, to be marked for rollback.

    However, looking at the Hibernate implementation:

        @Override
        public R getSingleResult() {
            try {
                final List<R> list = list();
                if ( list.size() == 0 ) {
                    throw new NoResultException( "No entity found for query" );
                }
                return uniqueElement( list );
            }
            catch ( HibernateException e ) {
                if ( getProducer().getFactory().getSessionFactoryOptions().isJpaBootstrap() ) {
                    throw getExceptionConverter().convert( e );
                }
                else {
                    throw e;
                }
            }
        }
    
        public static <R> R uniqueElement(List<R> list) throws NonUniqueResultException {
            int size = list.size();
            if ( size == 0 ) {
                return null;
            }
            R first = list.get( 0 );
            for ( int i = 1; i < size; i++ ) {
                if ( list.get( i ) != first ) {
                    throw new NonUniqueResultException( list.size() );
                }
            }
            return first;
        }
    

    it turns out Hibernate's interpretation of 'more than one result' seems to be 'more than one unique result'.

    In fact, I tested your scenario with all JPA providers, and it turns out that:

    • Hibernate does indeed return duplicates from getResultList(), but does not throw the exception due to the peculiar way getSingleResult() is implemented
    • EclipseLink is the only one that does not suffer from the duplicate result bug in getResultList() and consequently, getSingleResult() does not throw an exception, either (to me, this behaviour is only logical, but as it turns out, it is all a matter of interpretation)
    • OpenJPA and DataNucleus both return duplicate results from getResultList() and throw an exception from getSingleResult()

    Tl;DR

    I need someone to explain me if this is expected or if I misunderstood something.

    It really boils down to how you interpret the specification