What is the right way to interpret horizontal sclaing?

When I have a server (say S), does horizontal scaling imply:

Add many such servers (each one of them do the same job): S1, S2, S3..., then let all requests to our service come to the server S, which then distributes them between servers S1, S2, S3...

Is this interpretation correct? If yes, every web-service can be considered as scalable right? Because we can always add more servers and try to distribute the work.

Solution

Given a balancer on top and a server S, horizontal scalability means that you multiply S to S1,S2,S3,S4 and apply to the balancer some kind of "policy" to move traffic between servers.

One ( real world in production ) example I've done is to creare a vm, install nginx as balancer, set to use round robin ( most easy to configure and "think" ), configure some machines and install on them my service (in my case with docker).

Note: if you requirements (RAM,CPU,Disk space) let you install more than one instance of your service in a machine (S1,S2, etc..) do it, so you can handle more traffic

If you have kubernetes, this is solved by set how many instances of a given pod do you have, but you are not forced to use kubernetes in a small setup ( say for example 10 machine or 10 instances of S )

For reference: