I am building a web scraper as a small project (using CodeIgniter). Due to CORS policy, I am not allowed to get data from some sites.
To bypass that, I am using Rob Wu's CORS Anywhere. I'm prepending the cors_url to the URL I'm scraping data off of.
Everything works fine until I hit the maximum allowed limit of 200 requests per hour. After hitting 200 times, I get an HTTP status code: 429 (Too many requests).
Screenshot showing Network log.
As per the documentation, we can create an instance of our own server.js on Heroku. But, what I want to do is, to set it up locally for my local Apache server (localhost), just to test out the things first.
Some sample code:
var url = "http://example.com/";
var cors_url = "https://cors-anywhere.herokuapp.com/";
$.ajax({
method:'GET',
url : cors_url + url,
success : function(response){
//data_scraping_logic...
}
}
npm install cors-anywhere
node cors
- now it's running on localhost:8080// Listen on a specific host via the HOST environment variable
var host = process.env.HOST || '0.0.0.0';
// Listen on a specific port via the PORT environment variable
var port = process.env.PORT || 8080;
var cors_proxy = require('cors-anywhere');
cors_proxy.createServer({
originWhitelist: [], // Allow all origins
// requireHeader: ['origin', 'x-requested-with'],
// removeHeaders: ['cookie', 'cookie2']
}).listen(port, host, function() {
console.log('Running CORS Anywhere on ' + host + ':' + port);
});