I've been trying to study distributed caching for some time and have not been to clarify certain concerns listed below:
I am a little novice on things so certain concerns listed above might even make no sense. Solutions/ Corrections are also suggested on this.
My best attempt for answers:
I would say both. A distributed cache means that the logical idea of a cache is spread across multiple distinct machines. For example, you might have 5 nodes in your cache, and each node is on it's on machine/VM.
Usually once you need a distributed cache, your application is also distributed. Small website = one server, maybe one cache node. Big website = many web servers, distributed cache.
Most distributed caches distribute the cache entries evenly amongst nodes. If you write an entry to one node, it will get replicated to all other nodes. The idea is that each cache node can be taken out of the "cluster" and you don't lose any data.
The idea of having one cache entry on one machine is called sharding. It means you look at the cache key and then decide which cache node to store it on.
For existing distributed caches, you shouldn't have to manage/worry about any of this though.
In regards to distributed caches, they should be on their own machines with no other processes running. Caches usually reside in memory, so you don't want other things competing for that precious RAM.
You could technically put a web server on the same machine as a cache node, but just be aware they will compete for physical resources.
Don't worry about it. =) Each distributed cache behaves differently, so it's good to read up on it, but they all handle the replication of data on their own. You shouldn't have to worry about it/manage it.
I would maintain one logical cache that is distributed across many machines. Again the reason for this is in case a node goes down. If your cache node goes down and it had values that don't exist anywhere else, then you're in big trouble. (Database might get overwhelmed serving requests that the cache was handling.)
Good question. =) If the boxes are on the same internal network, the cost is really, really low. As long as the cache isn't on the west coast and the web servers are on the east coast, you should be fine. There is a price to pay of course, but there are creative ways to get around it.
Hope that helps!