Basically my dilemma is this. I have a list of x servers that host files. There is another server, that hosts the site's mysql db and application. When a file is uploaded (to the frontend server), the application checks to see which server has the most free space on it, and moves the file there. This works fine if you started with 2+ empty servers with identical amount of free space. If you introduce another server into the mix at a later point.... which will have more free space than the current servers, this method isnt so effective, because all the new files will be uploaded elusively to the new server, which would overload since it will be handling most of the new traffic till it catches up with the rest of the boxes in terms of free space.
So I thought to introduce a weighting system as well, which will help normalize the distribution of files. So if the 3 servers are set at 33% each, and 1 server has significantly more free space, it would still get more uploads than the other servers (even though it has the same weight), but the load would be spread out over all the servers.
Can anyone suggest a good php-only implementation of this?
One approach would be to sum all available space on all of the servers that have the space to hold the file (so a server with available space but not enough to hold the file would obviously be excluded). Then determine the percentage of that space that each server accounts for (so a new server would account for a proportionally larger percentage). Use a random number and align it with the percentages to determine which server to pick.
For instance, consider having five servers with the following free space levels:
Server 1: 2048MB
Server 2: 51400MB
Server 3: 1134MB
Server 4: 140555MB
You need to store a 1500MB file. That knocks Server 3 out of the running, leaving us with 194003MB total free space.
Server 1: 1.0%
Server 2: 26.5%
Server 4: 72.5%
You then choose a random number between 0 and 100: 40
Numbers between 0 and 1 (inclusive) would go to Server 1
Numbers > 1 and <= 26.5 would go to Server 2
Numbers > 26.5 and <= 100 would go to Server 4
So in this case, 40 means it gets stored on Server 4.