I'm running a RabbitMQ Server and client applications in minikube for development. I'm receiving intermittent 501 Frame Errors. The Error occurs pretty consistently when under load (60 msg/sec, 2-5kb/msg).
Error Message
From the RabbitMQ logs.
2023-02-26 16:43:12.635470+00:00 [error] <0.1056.0> operation none caused a connection exception frame_error: "type 3, first 16 octets = <<\"{\\\"payload\\\":{\\\"res\">>: {invalid_frame_end_marker,\n
99}"
2023-02-26 16:43:15.638860+00:00 [error] <0.1056.0> closing AMQP connection <0.1056.0> (10.244.0.18:60608 -> 10.244.0.21:5672):
2023-02-26 16:43:15.638860+00:00 [error] <0.1056.0> fatal_frame_error
Client
A Deno App using the deno-amqp library.
TCP Dump
Wireshark shows the TCP segment (?) sent (just before the server reports an error)
{"payload": { "id"...
("id" and not "res")The remaining content body is sent to the server just after the server reported the invalid frame end marker. The remaining content contains exactly the missing amount of bytes (44) from the previously started content frame (and its length header).
Validating Frames before sending
I've made sure to notice if my AMQP client were to incorrectly encode frames - no issues there.
if (data[7 + payload.byteLength] !== 206) {
console.log('sending invalid frame end')
console.log({ frame, data });
}
No concurrent TCP connection writes
I have lots of async functions publishing messages. I made sure to always group and fully write all sequential frames (publish method, header, body(s)) to the buffer. I used writeAll. If I understand correctly, Deno.Conn by default stops the event loop while writing.
Reproducing the error
I haven't been able to reproduce the issue using the same library and creating stress tests. I successfully sent messages many times larger and faster without issue to a docker RabbitMQ instance.
Spreading load over channels
I've tried publishing messages using 10 channels and round-robin distribution. This helped and run much longer until the error eventually showed up as well.
Open Questions
writeAll
mean that I'm guaranteed to be able to write all bytes in one go, regardless of underlying buffer size?This was a bug in deno-amqp
...
Edit: this has been fixed