I am looking for a load balancer that will direct load based on the user.
For example, I have a REST API that supports 3 different document types and 10 servers, each document type can not take up more than 5 servers, or another option where each document type is given 3 servers each. I don't want to lose the requests, but instead, queue them if possible. I am pretty sure a load balancer like this already exists, but I can not find the name of it/an implementation. Or even better is there an AWS ELB that is already capable of handling this?
You can put a Web Application Firewall in front of an Application Load Balancer and then use a rule like this to block traffic over the specified 5 minute rule.
{
"Priority": 0,
"Action": {
"Count": {}
},
"VisibilityConfig": {
"SampledRequestsEnabled": true,
"CloudWatchMetricsEnabled": true,
"MetricName": { "Fn::Sub": "${Site}-Overall-Rate-Limit" }
},
"Name": { "Fn::Sub": "${Site}-Overall-Rate-Limit" },
"Statement": {
"RateBasedStatement": {
"Limit": { "Ref": "OverallRateLimit" },
"AggregateKeyType": "IP"
}
}
}
Note that WAF is not free. You pay for the ACL and for the rule evaluation.
Another option here is to use a CDN like cloudfront to offload the content delivery to your users. It's pretty much impossible to DOS a CDN like cloudfront. Whether this method is appropriate for you depends on whether your content is static and shared or dynamic and unique to clients.
Having read your edits, I'll offer you another path. If your goal is to route different traffic to different servers, you can do that with ALB Listener Rules. https://docs.aws.amazon.com/elasticloadbalancing/latest/application/listener-update-rules.html
You would have to give the users a header to route them to the correct Target Group on the backend.
You could use sticky sessions to keep them on a specific server beyond that, but that can have its own implications.