puma workers vs separate ec2 instances

I am coming from Java/Tomcat stack and total newbie to RoR stack. I am trying to understand some of the concepts around puma configuration. I have read this and this but I am still unclear on workers terminology.

I understand that workers result in child process running puma. So essentially that allows you achieve parallelism, when using multi-core instance. But you can also do the same by launching as many ec2 single core instances?

Also, would it ever make sense to set workers > 0, if the instance is not multi-core.

Any info here would immensely help me. Thanks!

Solution

In the context of Puma workers and threads are both used to achieve concurrency so that Puma can process requests without always waiting for the previous requests to be finished. A good configuration will need to find a good balance between the amount of workers and threads and several aspects of the deployed application need to be taken into consideration:

Workers:
- Have bigger memory overhead since each forked process needs its own memory (this is mitigated on Linux due to (https://en.wikipedia.org/wiki/Copy-on-write) but is still a factor)
- Allow for parallelism when multiple cores are available. This is mostly a concern when processing requests is computationally heavy - which is something to avoid - if a request needs to perform some heavy computation it's a good idea to move it a background job using a library like (https://github.com/mperham/sidekiq)
- Can't be used on JRuby since it does not support forking
Threads
- The configured amount of threads will be run for each worker process - meaning if you have workers x and threads y then you are getting a total of x * y request processing threads
- Share memory so they have smaller memory footprint (even though there are gotchas here as well: (https://www.speedshop.co/2017/12/04/malloc-doubles-ruby-memory.html)
- On MRI, which is the default Ruby implementation, threads do not allow for executing Ruby code in parallel due to the GIL - this should not be a big concern as the GIL is not locked during waiting on IO which is where a lot of the execution time will be spent - accessing the database, communicating with APIs, etc.
- On JRUBY threads can achieve parallelism.
- Can't be used if you app is not threadsafe. Rails is threadsafe itself but you don't have guarantees for any 3d party code the app depends on or the app code itself. If the app is not threadsafe the answer here is easy - don't use threads (meaning configure min and max threads to 1). Lacking threadsafety is a situation where multi worker configuration makes sense even on a single core instance.
- For any amount of threads you need to make sure there are enough database connections in the connection pool. This typically means setting the Rails connection pool size to the number of threads you run in a worker process.

Comparing multiple workers to deploying to multiple EC2 instances misses a part of the picture: when using Puma with multiple workers there's a master Puma process that listens on a port and routes each request to an available worker process. When you have multiple EC2 instances then you need to take care of load balancing between them in some way - in the case of AWS that could be ELB or ALB. Deploying to multiple instances and load balancing are the right way to deploy any serious web application any way but that should not stop you utilizing instance resources better through workers and threads.

I'd suggest experimenting with the configuration of workers and threads and starting at setting workers to the number of cores and threads to 10 - then make adjustments if you encounter problems with memory usage or under utilization of resources.