I'm experiencing the following problem: we host an ecommerce on an Amazon EC2 medium instance with an RDS db instance, that normally run great, but because we work with product timed "releases" some users are using bots to automatic add items to basket and checkout them, causing the website to run really slow and then crash, basically like a ddos attack. Initially there were few users doing this and so I found their ips in the access log and blocked them. Now the word is spreading and I can't add ips manually to a blacklist but I need a "professional" way of doing this. Some friend suggested me to use Cloudflare but I'm asking if there is a way to do this internally in AWS or with Apache directly. Thank in advance
maestroosram,
This sounds like a problem you can't solve with the usual anti-scraping methods, such as blacklisting and rate limit.
Why not:
Black-listing : Depending on what kind of IP they are (hosting, open proxy) you could use one of the blacklist you can easily find on internet (these are pretty good https://www.iblocklist.com/).
BUT, Once they get a block, will switch to other solutions, until they found an IP address (or more thousands) that is not black-listed.
Rate-Limit: You could also try to block IP addresses that performs more than x requests per hour or minute. But since they are distributed over a large number of IP addresses in order to avoid detection, limiting them is not very helpful.
What you can do is implement a good Captcha system, and see what happens. This can stop these scrapers, but please consider that there are plenty of Captcha solver out there.(http://www.scrapesentry.com/scraping-wiki/common-methods-tools-break-captcha/)
Also, another way is by blocking ip addresses that share the same session_id. This is very risky, since there are ISP that balance the traffic via several gateways.