My textbook says:
should you need to run hundreds or thousands of concurrent I/O-bound operations, a thread-based approach consumes hundreds or thousands of MB of memory purely in thread overhead.
and
when running multiple long-running tasks in parallel, performance can suffer, then a thread-based approach is better
I'm confused, what's the difference between a non-threadpool thread's overhead and threadpool thread's? How overhead related to I/O-bound?
And finally, why thread-based approach (for example, use new Thread(runMethod).Start()
) is better for long-running tasks?
ThreadPool
has a limited number of reusable threads. This threads are used for tasks (e.g. Task.Run
). A task that executes for a longer period of time would block a thread so that it couldn't be reused for another Task
. So in order to always have enough ThreadPool
threads available (e.g. for async/await, Parallel Linq etc.), you should use ThreadPool
independent threads for this kind of tasks.
You do this by using the Task.Factory.StartNew(Action, TaskCreationOptions)
(or any other overload that accepts a TaskCreationOptions
object) and then pass in the parameter TaskCreationOptions.LongRunning
. LongRunning
forces a new thread that is independent from the ThreadPool
.
So for all long running and IO based tasks, like reading a file or database, you are supposed to use ThreadPool
independent threads by calling Task.Factory.StartNew(() => DoAction(), TaskCreationOptions.LongRunning);
. You don't need new Thread(runMethod).Start()
at all.
ThreadPool
threads are more resource efficient since they are reusable. So when retrieving ThreadPool
threads, they are already created. Creating new threads is always resource expensive. They need to be registered, call stacks must be created, locals must be copied, etc. This is why when considering performance, reusable threads are preferable choice, as long as the workload is lightweight (short running).