I am supporting another vendors legacy application.
This is a J2EE application that runs on Glassfish v3.1.2.2. It has a REST API implemented using JAX-RS. I have limited visibility to the application and source.
The symptoms are:
This happens for both remote calls as well as from calls made using curl on localhost.
If we make the same requests to a different port over HTTPS they succeed. We are reluctant to move the calls to that other port without knowing a root cause. These failed intermittently last night and now fail constantly today.
A packet capture of the request shows: - TCP overhead/handshake - A GET request - A single ACK from the application back to the caller - then nothing after that
What would cause Glassfish v3 to successfully handle and process an HTTP request but return no data?
Is there a mechanism in Glassfish v3 to flush or reset an HTTP listener and its associated thread pool?
Since this happens on a curl request on the same server to localhost I think I can rule out the network being the issue.
The ports being used communicate directly with Glassfish. There is no proxy (like Apache or Nginx) between the caller and the app server.
Are there logging or monitoring settings I should be enabling in Glassfish to observe what the HTTP listener is doing relative to the application and the network stack?
I have obfuscated some examples that show the symptoms:
Glassfish Access log:
"0:0:0:0:0:0:0:1" "NULL-AUTH-USER" "25/Oct/2018:11:21:02 -0500" "GET /api/obfuscated/by/me HTTP/1.1" 200 9002
Curl response for that same call:
* Trying OFBBFUSCATED
* Connected to hostname.local (OFBBFUSCATED) port 11080 (#0)
> GET /api/obfuscated/by/me HTTP/1.1
> Host: hostname.local:11080
> User-Agent: curl/7.43.0
> Accept: */*
> Authorization: Basic asdfdsfsdfdsfsdafsdafsdafw==
>
* Empty reply from server
* Connection #0 to host hostname.local left intact
UPDATE I changed a timeout setting for the HTTP network listener. I bumped it from 30 to 35 seconds because I was seeing a packet capture where the app was sending a FIN after 30 seconds. After making this change it started to work again.
It is not clear if this somehow flushed or reset something or if I had some kind of race condition.
The apparent root cause was high I/O on the system running these services. The applications normally used 50MB/sec, a new process drove that usage to 250MB/sec. Once the I/O problem was resolved all of the HTTP errors went away and haven't come back.