Search code examples
javasortingtomcatjava-streamcollectors

Why is the same Comparator acting differently in unit tests vs. when run as a web app?


TL;DR: After much trial and error, it appears as though the issue is Tomcat related, maybe in regards to the configured java versions, rather than the java language itself. See 'Edit 3' below for more details.

I have been using Java 8 streams and Comparators for awhile now and have never seen this type of behavior before, so I'm asking out of curiosity to see if anyone can find what's wrong with my stream.

I'm working on "Java-8-ifying" a legacy project by replacing our antiquated collection processing with streams (I was asked why I'm doing this, and the short answer is that we're essentially re-writing the project, but only have the time budget to do it incrementally. I'm doing the first step - updating the java version. There is a lot of messy code around collection logic, so "Java-8-ifying" is serving to clean up a lot of code and making things easier to read and maintain). For now, we're still using the old data types, so any 'date's mentioned are dealing with java.util.Date instances instead of the new Java 8 types.

This is my Comparator in ServiceRequest.java (which is a POJO):

public static final Comparator<ServiceRequest> BY_ACTIVITY_DATE_DESC = Comparator.comparing(
        ServiceRequest::getActivityDate, Comparator.nullsLast(Comparator.reverseOrder()));

When unit tested, this Comparator works as expected. ServiceRequests with a later activityDate are first in the resulting list, ones with an earlier activityDate are further down the list, and ones with a null activityDate are at the bottom. For reference, here is a complete copy of the unit test:

@Test
public void testComparator_BY_ACTIVITY_DATE_DESC() {
    ServiceRequest olderRequest = new ServiceRequest();
    olderRequest.setActivityDate(DateUtil.yesterday());

    ServiceRequest newerRequest = new ServiceRequest();
    newerRequest.setActivityDate(DateUtil.tomorrow());

    ServiceRequest noActivityDateRequest = new ServiceRequest();

    List<ServiceRequest> sortedRequests = Arrays.asList(olderRequest, noActivityDateRequest, newerRequest).stream()
            .sorted(ServiceRequest.BY_ACTIVITY_DATE_DESC)
            .collect(Collectors.toList());

    assertEquals(sortedRequests.get(0), newerRequest);
    assertEquals(sortedRequests.get(1), olderRequest);
    assertEquals(sortedRequests.get(2), noActivityDateRequest);
}

Note: DateUtil is a legacy utility that creates java.util.Date instances for our testing purposes.

This test always passes with flying colors, as I would expect. However, I have a controller that assembles a list of open service requests and groups them by requester identifier and selects only the most recent request for that user into a map. I attempted to convert this logic into the given stream:

private Map<Long, ServiceRequestViewBean> ServiceRequestsByUser(List<ServiceRequest> serviceRequests) {
    return serviceRequests.stream()
            .sorted(ServiceRequest.BY_ACTIVITY_DATE_DESC)
            .collect(Collectors.toMap(
                    serviceRequest -> serviceRequest.getRequester().getId(),
                    serviceRequest -> new ServiceRequestViewBean(serviceRequest),
                    (firstServiceRequest, secondServiceRequest) -> firstServiceRequest)
            );
}

My logic was that after the requests were sorted by most recent requests first, whenever multiple requests made by the same user were processed, only the most recent one would be put into the map.

However the observed behavior was that the OLDEST request was being put in the map instead. Note: I've since verified that when the controller code is invoked via a jUnit test, the behavior is as expected; the erroneous behavior only exhibits when the endpoint on the controller is called while running on tomcat. See 'Edit 3' for more details

I added some peeks to view the ServiceRequest IDs (not the requester IDs, which in this case would be the same when the merge function is encountered) before sorting, after sorting, and in the merge function for the map collecting. For simplicity, I limited the data to a single requester's 4 requests.

The expected order of ServiceRequest IDs:

ID      ACTIVITY DATE
365668  06-JUL-18 09:01:44
365649  05-JUL-18 15:41:40
365648  05-JUL-18 15:37:43
365647  05-JUL-18 15:31:47

Output of my peeks:

Before Sorting: 365647
Before Sorting: 365648
Before Sorting: 365649
Before Sorting: 365668
After Sorting: 365647
After Sorting: 365648
First request: 365647, Second request: 365648
After Sorting: 365649
First request: 365647, Second request: 365649
After Sorting: 365668
First request: 365647, Second request: 365668

I thought the interspersion of the map merge output with the after-sort peek was interesting, but I guess since there were no more stateful-intermediate operations, it just decided to add things to the map as it was peeking at them.

Since the peek output was the same before and after sorting, I concluded that either the sorting had no effect on the encounter-order, or that the comparator was sorting in ascending order for some reason (contrary to intended design), and either the input from the database just happened to be in this order, or the stream resolved the sorting before either of the peeks (though I'm not sure that's possible...). Out of curiosity, I put a sort on the database call to see if it would change the outcome of this stream. I told the database call to sort by activity date descending so that the order of the input going into the stream would be guaranteed. If the comparator was somehow inverted, it should flip the order of the items back to ascending order.

However the output of the DB-ordered stream was much like the first, only the order held true to the original order produced by the database sort... which leads me to believe that my comparator is having absolutely NO effect on this stream.

My question is WHY is this? Does the toMap collector ignore encounter order? If so, why does this cause the sorted call to be ineffective? I thought that sorted, as a stateful intermediate step, forced subsequent steps to observe encounter order (with the exception of forEach, as there is a forEachOrdered).

When I looked up the javadoc for toMap, it had a note about concurrency:

The returned Collector is not concurrent. For parallel stream pipelines, the combiner function operates by merging the keys from one map into another, which can be an expensive operation. If it is not required that results are merged into the Map in encounter order, using toConcurrentMap(Function, Function, BinaryOperator) may offer better parallel performance.

This leads me to believe that encounter order SHOULD be preserved by the toMap collector. I'm very lost and confused as to why I'm observing this particular behavior. I know I can solve this issue by doing a date comparison in my merge function, but I seek to understand WHY my comparator seems to work when used with a toList collector, but not with a toMap collector.

Thank you in advance for your insight!

Edit 1: Many have suggested using a LinkedHashMap to solve the problem, so I implemented that solution like so:

return serviceRequests.stream()
            .sorted(ServiceRequest.BY_ACTIVITY_DATE_DESC)
            .collect(Collectors.toMap(
                    serviceRequest -> serviceRequest.getRequester().getId(),
                    serviceRequest -> new ServiceRequestViewBean(serviceRequest),
                    (serviceRequestA, serviceRequestB) -> serviceRequestA,
                    LinkedHashMap::new));

But when tested, it's actually resolving to an older one instead of the desired newest one that the comparator should be enforcing. I'm still confused. Note: I've since verified that the erroneous behavior only exhibits when run as a webapp on Tomcat. When this code is called via a jUnit test, it functions as one would expect it to. See 'Edit 3' for more details

Edit 2: Interestingly enough, when I implemented the solution I -thought- would work (sorting in the merge function) is also not working:

 return serviceRequests.stream()
            .sorted(ServiceRequest.BY_ACTIVITY_DATE_DESC)
            .collect(Collectors.toMap(
                    serviceRequest -> serviceRequest.getRequester().getId(),
                    serviceRequest -> new ServiceRequestViewBean(serviceRequest),
                    (firstServiceRequest, secondServiceRequest) -> {
                        return Stream.of(firstServiceRequest, secondServiceRequest)
                                .peek(request -> System.out.println("- Before Sort -\n\tRequester ID: "
                                + request.getRequester().getId() + "\n\tRequest ID: " + request.getId()))
                                .sorted(ServiceRequest.BY_ACTIVITY_DATE_DESC)
                                .peek(request -> System.out.println("- After sort -\n\tRequester ID: "
                                + request.getRequester().getId() + "\n\tRequest ID: " + request.getId()))
                                .findFirst().get();
            }));

Which produces the following output:

- Before Sort -
    Requester ID: 67200307
    Request ID: 365647
- Before Sort -
    Requester ID: 67200307
    Request ID: 365648
- After sort -
    Requester ID: 67200307
    Request ID: 365647
- Before Sort -
    Requester ID: 67200307
    Request ID: 365647
- Before Sort -
    Requester ID: 67200307
    Request ID: 365649
- After sort -
    Requester ID: 67200307
    Request ID: 365647
- Before Sort -
    Requester ID: 67200307
    Request ID: 365647
- Before Sort -
    Requester ID: 67200307
    Request ID: 365668
- After sort -
    Requester ID: 67200307
    Request ID: 365647

Note: I've since verified that this erroneous output is only produced when running as a web app on Tomcat. When the code is invoked via jUnit test, it functions correctly as one would expect. See 'Edit 3' for more details

This seems to indicate that my comparator really is doing nothing, or is aggressively sorting in the opposite order that it does in the unit test, or perhaps findFirst is doing the same thing as toMap was doing. However the javadoc for findFirst would suggest that findFirst respects encounter order when an intermediate step such as sorted is used.

Edit 3: I was asked to make a minimum, complete, and verifiable sample project, so I did: https://github.com/zepuka/ecounter-order-map-collect

I tried a couple different tactics to try and reproduce the issue (each tagged in the repo), but was unable to reproduce the erroneous behavior I've been getting in my controller. My first solution, and all suggestions I've since tried, have all produced the desired, correct behavior! So why then when the application is run do I get different behavior? For kicks and giggles, I exposed the method on my controller as public so I could unit test it and used the exact same data that's been giving me trouble during the run in the unit test - it functions normally in the jUnit test. There must be something different that allows this code to run correctly in unit tests and in a normal Java main method, but incorrectly when run on my tomcat server.

The Java version I'm compiling with and running my server on are the same, though: 1.8.0_171-b11 (Oracle). At first I was building and running from within Netbeans, but I did a command line build and tomcat startup just to be sure there wasn't some weird Netbeans setting interfering. When I look at the run properties in netbeans though, it does saying it's using a 'Java EE 7 Web' as the Java EE version alongside the server setting (which is Apache Tomcat 8.5.29, running java 8), and I'll admit I have no idea what the 'Java EE version' is all about.

So I've added the Tomcat tag to this post as my problem seems to be Tomcat-related rather than Java language-related. At this point, the only way to solve my problem seems to be using a non-stream approach to building the map, but I would still love to know what people's thoughts are on what configuration I could look into to fix the issue.

Edit 4: I tried to get around the problem by using the old ways of doing things, and when I avoid using a Comparator in a Stream, things are fine, but as soon as I introduce a Comparator to a Stream anywhere in the process, the web app fails to behave correctly. I tried processing the list without a stream, and only using a stream on the two requests to use the comparator when merging into the map, but that doesn't work. I tried just making the Comparator old fashioned with an inline class definition, using plain old java instead of Comparator.comparing, but using that in a stream fails. Only when I'm avoiding the streams and comparator entirely does it seem to work.


Solution

  • Finally got to the bottom of it!

    I was able to isolate the issue by first applying the new comparator to everywhere that needed it. I was able to observe that most of these behaved as I would expect, and only on a certain page was the problem surfacing.

    My prior debugging output only included the IDs, but this time I included the activity dates for convenience, and when I hit that one JSP, they were null!

    The root of the problem is that in ONE case in a single DAO method (which did some regex parsing to call different internal methods - yeah, it's a mess), it wasn't using the row mapper I had checked before... this particular helper mehod contained a nefarious inline 'row mapper' that primitively used a loop and index to get the results of the query and put them in the object, and it was missing the activity date column. Seems that the history of the development of this particular page (as documented by the commit history I dug through), suffered from performance issues, so when they 'improved performance', they made that inline row mapper to only include the most crucial pieces of data needed at the time. And it turns out that it was the only page that fell into that particular branch of regular expression logic, which I had not previously noticed.

    And that's just another reason why this project is earning the 'worst case of spaghetti code' award. This one was particularly hard to track down as it was impossible to confirm if a 'working' result was truly working or if it just happened to work that time because order wasn't guaranteed from the database, nor when all the dates were null.

    TL;DR: Not tomcat's fault, but a rogue inline row mapper in a corner branch of DAO logic only triggered by a specific JSP