Search code examples
ruby-on-railsamazon-sqsshoryuken

SQS + Shoryuken: Large Receive Count in FIFO despite auto_delete=true


I have an AWS SQS FIFO queue configured to deduplicate messages based on content. My rails app uses Shoryuken worker to get messages from SQS. Here is the worker code:

class MyJob
  include Shoryuken::Worker

  shoryuken_options queue: "myjobs-#{ENV['RAILS_ENV']}.fifo",
                    auto_delete: true,
                    body_parser: JSON

  def perform(message_meta, message_body)
    # do stuff
  end
end

As you can see, it's configured to automatically delete messages from queue, once received. But today something strange happened. I noticed that the worker performs a large number of identical tasks. When I opened the SQS Queue in AWS Console, I saw there was a message in it, which looked it was received multiple times by the worker. Here are its attributes, notice the Receive Count:

Message ID: 9207017f-ad15-4de8-97c4-cf391c8f3840

Size: 1.3 KB

MD5 of Body: 55918bf431e31e4badae0720453aea35

Sent: 2018-12-11 10:40:53.978 GMT-08:00

First Received: 2018-12-11 10:40:54.045 GMT-08:00

Receive Count: 2654

Message Attribute Count: 0

Message Group ID: default Message

Deduplication ID: c5fb9acda5e3c9c82dc0ae3f0b1cff5bd7067d0cf942075c4c38dddd1fbc1ed1

Sequence Number: 37288893882837472512

Any idea how that could happen?

Platform details: Ubuntu, ruby 2.5.3, Rails: 5.2.2, Shoryuken: 4.0.2


Solution

  • Turns out, the problem was with the queue's VisibilityTimeout setting. By default it is set to 30 seconds, but often messages would arrive to the receiver side outside of the allowed 30 seconds, and this would mean that Shoryuken would fail to delete the received message from the queue with the following error:

    ERROR: Could not delete 0, code: 'ReceiptHandleIsInvalid', message: 'The receipt handle has expired', sender_fault: true

    The solution is to increase the VisibilityTimeout. I set it to the maximum allowed 12 hours, and that resolved the issue.

    More about VisibilityTimeout: https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-visibility-timeout.html

    The thread that put me on the right track: https://github.com/aws/aws-sdk-java/issues/705