We have an IoT device that connects to our MQTT broker behind the NLB. We are keeping the connection between IoT device and broker by utilising MQTT Keep Alive time and brokers heartbeat intervals.
Our IoT device sleeps most of the time. It wakes up in the following situations.
Whenever it wants to send PINQREST(every 340s -MQTT Keep Alive time) sends it to the broker. Other microservices publish some data, and brokers send that information to IoT devices.
Our objective is to sleep the IoT device as much as possible and maintain the connection to save the battery.
Problem: Normally, this particular IoT device sleeps most of the time. Our objective is to keep it sleeping as much as possible while maintaining a connection between IoT Device and the MQTT broker.
The problem is that IoT Device continuously wakes up every 20s whenever the broker sends some downstream data to the IoT device. This usually happens whenever IoT Device receives downstream data from a broker.
Based on our vendor's packet analysis, we found that NLB sends 120 bytes of TCP Keep-alive packets to IoT devices every 20s right after the broker publishes some downstream data. This is entirely sent by NLB and not by the broker.
Only happen in TLS : We found that this happens if we use TLS(8883) in NLB and terminate the TLS in NLB. If we remove the TLS, add the listener on a non-secure port (1883), and forward the traffic to Target's non-secure port, things are working as expected, and there are no 20s wake-up or keep-alive packet sent by NLB every 20s.
We also tested the same setup with CLB in an SSL port. It works without any problem and does not send a keep-alive to the client (IoT device).
We have removed the TLS and opened the non-secure port as a temporary workaround.
Why does NLB send keep-alive packets every 20s if we use TLS ? is this an intended behaviour of NLB? Any idea how we could resolve it?
The overview of the cloud setup:
MQTT broker runs in ECS Fargate Multi-AZ Broker in a private subnet
NLB is in between Client (IoT device) and Target(MQTT Broker)
NLB idle time keep resetting by two things
Keep alive time sent by Client(IoT device) every 340s Heartbeat time
published by Target(MQTT Broker)every 340s
Connection remains open
NLB offload the TLS in port 8883 and forward the traffic to target port 1883
I leave the answer from AWS here, as I cannot post a comment due to the length of the message. So, here is the answer:
When the NLB (TLS Listener only) receives a TCP KeepAlive packet from either the client or the target, NLB enables generating TCP KeepAlive packets and sends TCP KeepAlive packets to both frontend (NLB -> Client) and backend (NLB -> Target) connection every 20 seconds. This behavior was introduced back in 2019 and is not configurable.
We have opened a feature request to make it configurable (e.g., changing the 20 second interval or turn on/off TCP KeepAlive), and we will use your use case as supporting vote. Unfortunately, there is no ETA on when/if this will be released, however, you can keep an eye on https://repost.aws/ or "What's new" page for updates on new features: https://aws.amazon.com/new
In order to work around this behavior for your use case, consider using TCP listeners and terminating TLS on the Target.
However, I wasn't able to find the description of this behaviour regarding TCP KA for clients connected to NLB/TLS listener.