Search code examples
multithreadingamazon-web-servicesamazon-ecsaws-serverlessaws-fargate

AWS ECS Fargate and multi threading


Context: I am new to the "serverless" concept. I am creating a pick and place application. Basically the application will consume/pick messages from 40 queues and send/place these messages in a single outgoing FIFO queue (to maintain the sequence). The logic requires about 10 workers/threads to run in parallel working on different queues.

Pls dont suggest lambdas. It doesn't fit my use case.

I am planning to create this application in AWS ECS + fargate. I wanted to know will there be any problem when I use fargate for my java application which will create those 10 threads.

Is there any problem with multithreading when using fargate (which is a serverless concept)??


Solution

  • On a physical machine, you have a certain number of

    • CPUs (examples: 1 CPU on your laptop or 4 CPUs on a server),
    • each CPU has Y cores (example: 6 cores),
    • each core can potentially do hyperthreading (usually 2 threads per core). Think of CPU core threads as conveyer belts leading to the core: when one conveyer belt is empty, the core can work on things coming on the other conveyer belt to process. In most architectures (like Intel), there are two threads (conveyor belts) for each CPU core. Of course, if your conveyer belts are fully loaded (if you are running a very intensive task), then there will be switching costs. I believe that Amazon's new CPU, the Graviton, has 1 thread per core (no Hyperthreading). So, you need to look at each server instance specifically to know how many threads each core has.

    Now, do not confuse CPU threads (ex: 2 threads per CPU core) and application threads! Those are two totally different things!

    You then need to understand that each OS uses the above CPU/core/threads differently. It creates processes, threads, and uses time slices on those CPUs/cores/threads. For example, on your laptop, you likely have only one CPU with 2 to 6 core (for Intel processors, depending on i3, i5, i7), or a little bit more on the latest Apple M1. In reality, on your laptop, you run your browser, you might run an IDE, you might run a web server, an application server, docker, Excel, whatever else. These are LOTS of processes and application threads. Way more than there are CPUs/cores/threads. It's the Operating System (OS) that splices the work and puts it on the conveyor belt. In Linux, you can make some processes "nice" to yield to other processes, or you can make them "take all they can" from the processor. There are many ways to slice up the work. So, you need to look at the OS as well.

    Another example: when I install Apache Tomcat on my development laptop, Tomcat runs on a JVM and starts thread pools which might contain dozens of threads. And then I'd install an application on that Tomcat server which might have a database connection pool containing 20 threads. As you can see, just my Tomcat server is probably running 30-40 Java threads while my Intel i7 laptop only has 1 CPU, 6 cores, and supports HyperThreading = 1x6x2 = 12 threads.

    In AWS, everything is virtualized, so 1 vCPU does NOT map to 1 CPU! A vCPU actually maps to one core thread. And that gets confusing because AWS doesn't use the same CPU on all servers. You need to look at the documentation to see which server class maps to which number of threads, etc. For example, I believe on Intel Xeon processors, 1vCPU=1 HyperThread (so, one of the two conveyor belts leading to one core). But for servers that use the new Graviton CPU (which, I think, has a single thread per CPU core), you get one core.

    Finally, in AWS Fargate, you specify CPU units. Where 1024 units = 1 vCPU. This is hard to mentally process but think of how the OS time slices processes and how your laptop is currently running a lot (maybe hundreds) of processes and threads yet only has 1 CPU and a few cores. Think of it the same way with CPU units: you get a slice of the CPU. Or think of it as if you get access to one of those conveyor belts leading to the core: if you set cpu units to 1024, you get the equivalent of "1 core". Note that, in reality, it's actually better than that because AWS is packing those conveyor belts so my example is a little "flaky" (but I guess you get the idea).

    Now, how many threads can you run on an ECS container in fargate if, for example, you set your docker container/task to use only 256, or 512 cpu units (a quarter or half of one conveyor belt leading to a core)? That is hard to say because it depends on what you're doing. If you're solving math intensive problems that use the cpu threads to the fullest, you probably can't run too many application threads. But if you're running an application server that waits around a lot (waiting for responses from the database, waiting for requests from the users, etc), then you can crank up the number of threads.

    In the end, you likely want to load test your application. If you put too many threads, your application will spend a lot of time, switching from one thread to the other (so that it is fair to all threads) and your app will crawl. If you set it too low, you are leaving capacity on the table. The only way to know for sure, is to test it and find the sweet spot.

    Mistakes you should not make:

    • think that 1 physical CPU = 1 vCPU (totally not the case, more likely = 1 cpu thread),
    • think that 1 core = 2 threads (not always, depends on cpu architecture and other things, but it will likely be 1, or 2 threads, need to look in the AWS documentation to look up exact values),
    • think that 1 application thread = 1 CPU thread (these are totally different things),
    • think that if you have 1 CPU thread, that you can only run single-threaded applications (that is totally not the case).

    Remember, 1 vCPU (~ 1-2 cpu threads) can run MANY application threads. Only you can figure out what is too low, what is too high, and where the sweet spot is. I hope this helps. Feel free to correct this post if/where I made mistakes and/or if I made too big logical shortcuts (I also struggled with this so I'm happy to be corrected).