I have a lot of data of users. Each one has a unique user id which is represented by a base64 of six characters. for example (Fga6=gY). I used a hash function to spread the users across my MySQL machines.
I have four machines, so when I want to save a new user, I just check on which server i should save it:
hash(userId) % 4
My question is how can I add more machines in case I have too much load on the 4 machines. if I will start using 6 machines and use modulo 6 instead of 4 it will work for new users. but when try to query old users my application server will be forwarded to the wrong machine.
How can I solve it for a system that is already in production?
How can I plan a system better to allow me to scale out the machines whenever it's needed?
Too bad you did not ask before implementing that simple hash.
First
I strongly recommend writing a script to move one user from one shard to another shard. It will be an important component in most of all the following advice.
Alternatives to Hash
A "dictionary lookup" is where you have the users listed in a table, together with which shard they are on. This allows for moving an individual user from one shard to another. (The details of such a "move" may be quite complex.) This allows a variety of good things: Moving a few users from a busy server to a less busy one; adding a new shard; outfitting a new shard, then move users to the new server. Etc.
A compromise between the "dictionary" and the "hash" works like this: Hash the user_id into, say, 512 possible values. Then use a dictionary of only 512 entries to find what shard those users are on. This requires moving a set of users, not just a single user, but keeps the dictionary at a fixed size.
In either case, the "dictionary" needs to be propagated to all clients for them to use.
Small help
Rehashing involves moving essentially all the data around. This is quite costly in downtime. Instead, ponder this. Hash mod 8 (specifically 2*4). Then split one shard into two servers. This gives pain for only 1/4 of the users. Actually, with the "move one user" script, only 1/8 of users will be impacted. There is, of course, some amount of code in the clients to understand that some shards have 1/4th of the users; some have 1/8th. (This leads you from 4->8, and not your requested 4->6.)
Alternative topology
If you clients are on the same servers as the shards, consider moving the app code onto other servers. That is, have only MySQL on the database shards; have other stuff on other servers. And configure them differently (RAM, IOPs, etc) if needed. Your system may actually need more than 4 or fewer than 4 client machines for the load in question. This topology allows scaling each separately.
Optimization
Do you have the slowlog turned on? With a low value for long_query_time
? Run pt-query-digest
against that log. Then, let's discuss the first few queries. They may be running slower than they could be. https://mysql.rjweb.org/doc.php/mysql_analysis#slow_queries_and_slowlog
This might let you postpone adding a shard.