Sidekiq documentation says:
Don't set the concurrency higher than 50. I've seen stability issues with concurrency of 100, for example
Well, my low memory consumption enables me to use concurrency of 350 threads on a single 512MB X1 heroku dyno. And I would like to use ~300 because all jobs are IO intensive (http requests).
I wonder what issues can I encounter in?
I tried to monitor the logs at overload with 80 and seen no issues.
What issues should I expect when setting up concurrency of 300 threads? Will I risk jobs getting terminated without being moved to the "dead" queue? OR just a termination of workers that I will be able to watch. Is it safe to set a concurrency of 300 or 100?
The owner of sidekiq doesn't know the answer and here is the issue I opened.
UPDATE: In high load, when I increased from 80 to 100 I started getting 'can't create Thread: Resource temporarily unavailable' errors here and there, in extreme cases of 180 threads it will sometime terminate the entire sidekiq process.
The memory consumption was always between 140MB to 240MB according to Heroku metrics.
I used TTIN signal as describe here
And found that most threads are waiting on those lines of code:
app[worker.1]: 3 TID-ow5z46exw WARN: /app/vendor/ruby-2.3.0/lib/ruby/2.3.0/monitor.rb:187:in `lock'
app[worker.1]: 3 TID-os9ulw8ps WARN: /app/vendor/ruby-2.3.0/lib/ruby/2.3.0/net/http.rb:880:in `initialize'
app[worker.1]: 3 TID-os9ulw8ps WARN: /app/vendor/ruby-2.3.0/lib/ruby/2.3.0/timeout.rb:95:in `join'
app[worker.1]: 3 TID-osjnd6zac WARN: /app/vendor/ruby-2.3.0/lib/ruby/2.3.0/net/protocol.rb:158:in `wait_readable'
Everything is documented in the github issue
The owner of sidekiq says that the traces looks fine, so no luck spotting the root cause for the stablity issue, but there is input in how many threads causes it and what is the symptom.
Well, sidekiq stability issues in high concurrency are as follows.
When you are setting a concurrency that is higher than 80 (or 50) you may encounter in this error "can't create Thread: Resource temporarily unavailable:"
Some jobs will return back to queue, sometimes the entire process will be terminated and jobs will be lost, unless you use sidekiq pro reliability feature
It seems that we are hitting heroku's maximum 256 threads limitation although sidekiq is configured to use 80 threads. It doesn't help if I use multiple sidekiq processes inside single heroku dyno when I did it, I still ran into this limit.
It seems like a thread leak, and this is the next thing to investigate.
The above will happen also when the memory consumption will stay low (< 240MB in my example)
Everything is updated in the github issue