I'm using wget
to grab a something from the web, but I don't want to follow a portion of the page. I thought I could set up a proxy that would remove the parts of the webpage I didn't want to be processed, before returning it to wget but I'm not sure how I would accomplish that.
Is there a proxy that lets me easily modify the http response in python or node.js?
There are several ways you could achieve this goal. This should get you started (using node.js). In the following example I am fetching google.com and replacting all instances of "google" with "foobar".
// package.json file...
{
"name": "proxy-example",
"description": "a simple example of modifying response using a proxy",
"version": "0.0.1",
"dependencies": {
"request": "1.9.5"
}
}
// server.js file...
var http = require("http")
var request = require("request")
var port = process.env.PORT || 8001
http.createServer(function(req, rsp){
var options = { uri: "http://google.com" }
request(options, function(err, response, body){
rsp.writeHead(200)
rsp.end(body.replace(/google/g, "foobar"))
})
}).listen(port)
console.log("listening on port " + port)