MongoDB with Sharding and ReplicaSets, I am getting confused

I have a few questions, which I could not really answer myself.

What I understood so far:

Replication: I can replicate data, so that on failover, my data can be accessed by another instance. Clear so far.

Sharding: I can split my data into shards, that if my datasets become too large, it can add more data on other machines.

In mongodb I need 3 configservers, at least one masterserver (mongos, I need propably 3 to avoid beeing affected from a failover) and at least one dataserver(mongod, propably 3 to avoid failover) that contains the data.

My Questions:

With what kind of hardware should I begin with on the dataservers? (Size of GB of the datadir, what would be good to start with?)
Is it good to run the configservers on the data instances on extra instances?
Where do I start the mongos on? I could do it on one of the 3 instances (config and or dataservers), but is it good?
How do I know, that the dataserver needs a new instance early enough? (before it is full?)
How many replicasets do I need? (or on what depends that?)
I have on 3 completely different servers a zookeeper running, could I run my configservers on them as well, as far as the performance is fine or is that a no-go?

Solution

With what kind of hardware should I begin with on the dataservers?

This is impossible to answer without knowing your working set. The amount of RAM MongoDB needs is the size of your working set.

Is it good to run the configservers on the data instances on extra instances?

I would personally, for failover reasons, you don't want your config servers going down with some random replica shard.

Where do I start the mongos on? I could do it on one of the 3 instances (config and or dataservers), but is it good?

The mongos is nothing more than a router for queries and it is a good idea normally to slap these onto your application servers so your application talks to a (or some) local mongos which then routes to your network. There is a small note that these can take some CPU/memory usage if you send a lot of aggregation queries through them.

How do I know, that the dataserver needs a new instance early enough?

This depends on where your server is. I mean if it is on AWS you can setup alerts that automatically trigger the onlining and setup of a new shard when total available disk space in the cluster reaches a tipping point. However, this is all upto where your servers are and also who they are with and you will need to look into this more yourself.

How many replicasets do I need?

One per shard. Basically each shard should be a replica set.

I have on 3 completely different servers a zookeeper running, could I run my configservers on them as well, as far as the performance is fine or is that a no-go?

I have not used Zookeeper enough to be able to answer that.