Search code examples
javahibernatebean-validation

Hibernate Validation 6.0 ListValueExtractor.extractValues seems to have poor performance with large lists


I am using Hibernate Validation 6.x. I have a field in an object that I'm validating which contains a list, List<@NotNull Double> doubles for example. The issue I'm facing is that when the list is very large, the performance degradation is substantial. To investigate the issue I implemented the validation of list elements as a custom validator on the List, @ValidDoubles List<Double> doubles, using a stream to iterate over the elements, and achieved a ~65% performance improvement for that validator.

After profiling the application I can see that the majority of time is being spent in ListValueExtractor.extractValues, which can be found here. I am hoping that someone could explain why this method seems to be so expensive and if there are any known workarounds.

An example Object:

public class myDataObject {

    private List<@NotNull Double> doubles // List which can contain thousands of values

    // Getters and Setters
}

Update

Upon further profiling and investigation I believe the issue is related to Hibernate keeping track of which beans have already been validated when performing cascading validation, in particular the use of System.identityHashCode when doing so (Here is the code).

Looking at my profiler, I can see that 11.6% of CPU time is spent validating the input beans. Of that time, 11.3% of time is spent calling System.identityHashCode. Interestingly, it is the second child object where the time is being spent even though they contain relatively simple validations. I wonder if I have configured either the validator or beans wrongly as this seems to be very weird.

My Validator configuration looks like so:

<bean id="validator" class="org.springframework.validation.beanvalidation.LocalValidatorFactoryBean">
         <property name="validationPropertyMap">
             <util:map>
                 <entry key="hibernate.validator.fail_fast" value="true"/>
             </util:map>
         </property>
</bean>

Validator invocation:

Set<ConstraintViolation<InputObject>> violations = validator.validate(input);

Example object structure

public class InputObject {
    @NotNull
    String name;

    @Valid
    List<FirstChild> firstChildren; // on average 10 objects but can be up to very large

    // Getters and Setters
}

public class FirstChild {
    @SomeCustomValidator // Not important
    Integer someValue;

    // 3 to 4 further fields with simple validators

    @Valid
    List<SecondChild> secondChildren; // On average around 40 objects but can be very large

    // Getters and Setters
}

public class SecondChild {
    @NotBlank
    String foo;

    @NotBlank
    String bar;

    // Getters and Setters
}

In conclusion:

  • From the profiler, the issue lies in the cascading validations found from the @Valid annotations on the lists.
  • The issue appears to be with Hibernate keeping track of the which objects have already been validated when performing cascading validation.
  • The profiler is showing System.identityHashCode as the method taking up the majority of time spent validating.

Is this an optimization issue with Hibernate or could I either configure my validator or input object structure in some way that would produce better performance?


Solution

  • Tough one.

    So the issue you see is that we create a BeanGroupProcessedUnit per list value so when you have plenty, it doesn't scale well.

    You don't have the issue when moving things outside of the list as we only keep a processed unit for the whole list.

    I'm not entirely sure there's an easy fix for this that doesn't break other use cases but we should at least check if we can improve the situation in the case you have.

    That being said, I would appreciate if you could take the time to open an issue on our tracker https://hibernate.atlassian.net/projects/HV/issues with a reproducer based on https://github.com/hibernate/hibernate-test-case-templates/tree/master/validator ? That would be helpful to start the process.