Which is more Efficient? More Cores or More CPUs

I realize this is more of a hardware question, but this is also very relevant to software, especially when programming for mult-threaded multi-core/cpu environments.

Which is better, and why? Whether it be regarding efficiency, speed, productivity, usability, etc.

1.) A computer/server with 4 quad-core CPUs?

2.) A computer/server with 16 single-core CPUs?

Please assume all other factors (speed, cache, bus speeds, bandwidth, etc.) are equal.

Edit:

I'm interested in the performance aspect in general. As to if it's particularly better at one aspect and horrible (or not preferable) at another, then I'd like to know that as well.

And if I have to choose, I'd be most interested which is better in regards to I/O-bound applications, and compute-bound applications.

Solution

That's not an easy question to answer. Computer architecture is unsurprisingly rather complicated. Below are some guidelines but even these are simplifications. A lot of this will come down to your application and what constraints you're working within (both business and technical).

CPUs have several (2-3 generally) levels of caching on the CPU. Some modern CPUs also have a memory controller on the die. That can greatly improve the speed of swapping memory between cores. Memory I/O between CPUs will have to go on an external bus, which tends to be slower.

AMD/ATI chips use HyperTransport, which is a point-to-point protocol.

Complicating all this however is the bus architecture. Intel's Core 2 Duo/Quad system uses a shared bus. Think of this like Ethernet or cable internet where there is only so much bandwidth to go round and every new participant just takes another share from the whole. Core i7 and newer Xeons use QuickPath, which is pretty similar to HyperTransport.

More cores will occupy less space, use less space and less power and cost less (unless you're using really low powered CPUs) both in per-core terms and the cost of other hardware (eg motherboards).

Generally speaking one CPU will the the cheapest (both in terms of hardware AND software). Commodity hardware can be used for this. Once you go to the second socket you tend to have to use different chipsets, more expensive motherboards and often more expensive RAM (eg ECC fully buffered RAM) so you take a massive cost hit going from one CPU to two. It's one reason so many large sites (including Flickr, Google and others) use thousands of commodity servers (although Google's servers are somewhat customized to include things like a 9V battery but the principle is the same).

Your edits don't really change much. "Performance" is a highly subjective concept. Performance at what? Bear in mind though that if your application isn't sufficiently multithreaded (or multiprocess) to take advantage of extra cores then you can actually decrease performance by adding more cores.

I/O bound applications probably won't prefer one over the other. They are, after all, bound by I/O not CPU.

For compute-based applications well it depends on the nature of the computation. If you're doing lots of floating point you may benefit far more by using a GPU to offload calculations (eg using Nvidia CUDA). You can get a huge performance benefit from this. Take a look at the GPU client for Folding@Home for an example of this.

In short, your question doesn't lend itself to a specific answer because the subject is complicated and there's just not enough information. Technical architecture is something that has to be designed for the specific application.