How Google cloud achieved scalability through virtualization?

I have a question about how Google app engine achieve scalability through virtualization. For example when we deploy a cloud app to Goodle app engine, by the time the number of users of our app has been increased and I think Google will automatically generate a new virtual server to handle user request. At the first time, the cloud app runs on one virtual server and now it runs on two virtual servers. Google achieved scalability through virtualization so that any one system in the Google infrastructure can run an application’s code—even two consecutive requests posted to the same application may not go to the same server

Does anyone know how an application can run on two virtual servers on Google. How does it send request to two virtual server and synchronizes data, use CPU resources,...?

Is there any document from Google point out this problem and virtualization implement?

Solution

This is in now way a specific answer since we have no idea how Google does this. But I can explain how a load balancer works in Apache which operates on a similar concept. Heck, maybe Google is using a variable of Apache load balancing. Read more here.

Basically a simply Apache load balancing structure consists of at least 3 servers: 1 head load balancer & 2 mirrored servers. The load balancer is basically the traffic cop to outside world traffic. Any public request made to a website that uses load balancing will actually be requesting the “head” machine.

On that load balancing machine, configuration options basically determine which slave servers behind the scenes send content back to the load balancer for delivery. These “slave” machines are basically regular Apache web servers that are—perhaps—IP restricted to only deliver content to the main head load balancer machine.

So assuming both slave servers in a load balancing structure are 100% the same. The load balancer will randomly choose one to grab content from & if it can grab the content in a reasonable amount of time that “slave” now becomes the source. If for some reason the slave machine is slow, the load balancer then decides, “Too slow, moving on!” and goes to the next machine. And it basically makes a decision like that for each request.

The net result is the faster & more accessible server is what is served first. But because the content is all proxied behind the load balancer the public accesses, nobody in the outside world knows the difference.

Now let’s say the site behind a load balancer is so heavily trafficked that more servers need to be added to the cluster. No problem! Just clone the existing slave setup to as many new machines as possible, adjust the load balancer to know that these slaves exist & let it manage the proxy.

Now the hard part is really keeping all machines in sync. And that is all dependent on site needs & usage. So a DB heavy website might use MySQL mirroring for each DB on each machine. Or maybe have a completely separate DB server that itself might be mirroring & clustering to other DBs.

All that said, Google’s key to success is balancing how their load balancing infrastructure works. It’s not easy & I have no clue what they do. But I am sure the basic concepts outlined above are applied in some way.