Search code examples
ruby-on-railsmongodbmongoidsidekiqsidetiq

Sidekiq fails, dequeueing mongoid connection times out, probably too many connections


I am currently running sidekiq 4.1.2. I've never managed to be able to run more than a handfull of jobs concurrently. Recently it's been looking like I've run into an issue described in the Sidekiq's Troubleshooting WIKI called Too many connections to MongoDB. Apparently, mongoid 3 doesn't properly disconnect workers. However, I am using mongoid 5.1.3.

My issue surfaces when a job, while a few other jobs are running, tries to hit the database with a query:

Timeout::Error: Timed out attempting to dequeue connection after 30 sec.
/home/me/applications/myapp/shared/bundle/ruby/2.2.0/gems/mongo-2.2.5/lib/mongo/server/connection_pool/queue.rb:190:in `wait_for_next!'
/home/me/applications/myapp/shared/bundle/ruby/2.2.0/gems/mongo-2.2.5/lib/mongo/server/connection_pool/queue.rb:176:in `block in dequeue_connection'
/home/me/applications/myapp/shared/bundle/ruby/2.2.0/gems/mongo-2.2.5/lib/mongo/server/connection_pool/queue.rb:190:in `wait_for_next!'
/home/me/applications/myapp/shared/bundle/ruby/2.2.0/gems/mongo-2.2.5/lib/mongo/server/connection_pool/queue.rb:176:in `block in dequeue_connection'
/home/me/applications/myapp/shared/bundle/ruby/2.2.0/gems/mongo-2.2.5/lib/mongo/server/connection_pool/queue.rb:172:in `loop'
/home/me/applications/myapp/shared/bundle/ruby/2.2.0/gems/mongo-2.2.5/lib/mongo/server/connection_pool/queue.rb:172:in `dequeue_connection'
/home/me/applications/myapp/shared/bundle/ruby/2.2.0/gems/mongo-2.2.5/lib/mongo/server/connection_pool/queue.rb:62:in `block in dequeue'
/home/me/applications/myapp/shared/bundle/ruby/2.2.0/gems/mongo-2.2.5/lib/mongo/server/connection_pool/queue.rb:61:in `synchronize'
/home/me/applications/myapp/shared/bundle/ruby/2.2.0/gems/mongo-2.2.5/lib/mongo/server/connection_pool/queue.rb:61:in `dequeue'
/home/me/applications/myapp/shared/bundle/ruby/2.2.0/gems/mongo-2.2.5/lib/mongo/server/connection_pool.rb:51:in `checkout'
/home/me/applications/myapp/shared/bundle/ruby/2.2.0/gems/mongo-2.2.5/lib/mongo/server/connection_pool.rb:107:in `with_connection'
/home/me/applications/myapp/shared/bundle/ruby/2.2.0/gems/mongo-2.2.5/lib/mongo/server/context.rb:63:in `with_connection'
/home/me/applications/myapp/shared/bundle/ruby/2.2.0/gems/mongo-2.2.5/lib/mongo/operation/executable.rb:34:in `execute'
/home/me/applications/myapp/shared/bundle/ruby/2.2.0/gems/mongo-2.2.5/lib/mongo/collection/view/iterable.rb:80:in `send_initial_query'
/home/me/applications/myapp/shared/bundle/ruby/2.2.0/gems/mongo-2.2.5/lib/mongo/collection/view/iterable.rb:41:in `block in each'
/home/me/applications/myapp/shared/bundle/ruby/2.2.0/gems/mongo-2.2.5/lib/mongo/retryable.rb:51:in `call'
/home/me/applications/myapp/shared/bundle/ruby/2.2.0/gems/mongo-2.2.5/lib/mongo/retryable.rb:51:in `read_with_retry'
/home/me/applications/myapp/shared/bundle/ruby/2.2.0/gems/mongo-2.2.5/lib/mongo/collection/view/iterable.rb:39:in `each'
/home/me/applications/myapp/shared/bundle/ruby/2.2.0/gems/mongoid-5.1.3/lib/mongoid/query_cache.rb:207:in `each'
/home/me/applications/myapp/shared/bundle/ruby/2.2.0/gems/mongoid-5.1.3/lib/mongoid/contextual/mongo.rb:121:in `each'
/home/me/applications/myapp/shared/bundle/ruby/2.2.0/gems/mongoid-5.1.3/lib/mongoid/contextual/mongo.rb:295:in `map'
/home/me/applications/myapp/shared/bundle/ruby/2.2.0/gems/mongoid-5.1.3/lib/mongoid/contextual/mongo.rb:295:in `map'
/home/me/applications/myapp/shared/bundle/ruby/2.2.0/gems/mongoid-5.1.3/lib/mongoid/contextual.rb:20:in `map'
/home/me/applications/myapp/releases/20160618143407/app/jobs/myjob.rb:8:in `block in perform'

After one job fails, the other jobs fail soon after. This most often happens after a few handfull of jobs have finished succesfully, which could indicate that those jobs don't disconnect from the database. Looking at top I can't see that mongo load cpu is too much.

At the same time this started to occur, I noticed that my sidetiq 0.7.0 enabled recurring jobs were not scheduled properly. One job has stopped being queued, and others are only queued once after restart.

According to my Sidekiq web interface I have 1 queue, called default, with 25 threads. Max. 12-15 of them get busy at the same time.

Any idea how to troubleshoot this issue?


Solution

  • The default max pool queue size is 5. Bumping max_pool_size up to e.g. 25 will enable more connections to your db.

    production:
      clients:
        default:
          options:
            max_pool_size: 25