Ok so, I'm solving an very parallel problem.
In any case, the thing that inspired me to write this (in part) was realisation of my access to this dual Xeon E5520 CPUs (with IIRC 16GB ram to go with it)
So I know that each CPU supports 8 active threads. But then there are background processes (and likely other users) using up some of those (in fact probably more that all of those). So what is a good rule of thumb as to how many threads make things go faster, before they are being held back by their over head. (I guess this rule would need to take into account how many threads can be active at once)
There is no such rule. It will depend on many factors, particularly on whether your app is I/O bound (it sounds like yours isn't). The thing to do is to parameterise the number of threads so that it can be specified from a config file or from the command line, and then play around with this number until you hit a sweet spot for your particular problem and configuration.