What is the best algorithm to use for scheduling an application that will support 10K concurrent threads with heavy I/O but low CPU usage? Links to papers are appreciated.
Why wouldn't you use SCHED_RR? You said it yourself: low CPU usage. You could even nice the process when you expect to do some heavy I/O so you're scheduled less often than other processes.
In general, though, why not let the OS do what it's best at, and just worry about writing efficient code? The OS will know you're doing a blocking I/O call and will put your thread/task in a waitqueue and select another task to run. You don't need to worry about those details.