I have following setup (as visible in the attached image):
A (java process) -> B (kubernetes ambassador proxy) -> C (java service in kubernetes pod)
Communication is done using HTTPS between A and B, and then ambassador strips HTTPS and continues talking HTTP with C.
The problem I'm experiencing is that sometimes, it happens that HTTP BODY message that is being sent is not transmitted 100% between A and B, but only on B side trace I can see that it stopped for some reason (in trace on A side it is shown as all is sent OK). Then, java process in C (which is waiting for B-proxy to forward all the data) is just waiting and time-outs after 30 seconds.
You can see in the image attached, in A trace it is written that whole BODY is sent, but in trace on the B side, only half of BODY is visible (delivered). I suspect on these TCP Previous segment not captured
.
You can also see that after this it just waits for 30 seconds, and times-out.
It happens pretty often in my setup. Does anyone knows what could be a problem?
Ambassador config:
getambassador.io/config: |
---
apiVersion: ambassador/v1
kind: TLSContext
name: tls
ambassador_id: some-stg
secret: ambassador-tls-cert
---
apiVersion: ambassador/v1
kind: Module
name: ambassador
ambassador_id: some-stg
config:
service_port: 8443
diagnostics:
enabled: true
envoy_log_type: json
---
apiVersion: ambassador/v1
kind: Module
name: tls
ambassador_id: some-stg
config:
server:
enabled: True
redirect_cleartext_from: 8080
alpn_protocols: "h2, http/1.1"
secret: ambassador-tls-cert
---
apiVersion: ambassador/v1
kind: TracingService
name: tracing
service: tracing-jaeger-collector.tracing:9411
driver: zipkin
ambassador_id: some-stg
tag_headers:
- :authority
- :path
UPDATE
Here are also traces on cloudshark: A dump (sending side - outside kubernetes) : https://www.cloudshark.org/captures/8cfad383c8fb B dump (kubernetes ambassador proxy receiver): https://www.cloudshark.org/captures/50512920d898
It seems that my colleague found what is wrong. There is a AWS load balancer in front of this ambassador pod, and when he recreated it - it seems to work now without any problem. I guess that one sent ACKs to client (A), but did not pass all the messages to ambassador pod (B). He recreated load balancer with different type (NLB), since classic one is not working.