I understand horizontal scaling, vertical scaling, sharding, I want to gain more understanding on what will happen to the application i.e the effects of not scaling over how I can solve the problem by scaling.
Here are my doubts,
Generally, all requests have a timeout, these timeouts occur at most layer boundaries (Browser->HTTP server, HTTP Server -> Application Server / Microservices layer, Application -> Database). When your load increases to the point where some layer cannot service the request before that timeout occurs, the user will not get a response, and the application will be broken
Depending on where the timeout occurs, you may send a useful error, or it could be a generic "hang" where the application appears to be frozen or broken in some way.
If enough requests are awaiting servicing, and you have turned up all the timeouts to an unreasonably high level, you may allow more and more threads to queue. These threads use memory, and ultimately you will run out of memory and be unable to create additional threads, at which point the application will once again hang and become unresponsive.