Search code examples
pythonscrapyhttp-proxy

How to set up proxymesh with Scrapy?


I have some Scrapy code that I'd like to start using with proxymesh. The proxymesh documentation says cryptically:

For the scrapy crawling framework, you must set the http_proxy environment variable [...] then activate the HttpProxyMiddleware.

I understand how to set the http_proxy environment variable, but how to "activate the HttpProxyMiddleware" is not totally obvious from the documentation. I think I need to add the following to settings.py in my Scrapy project:

DOWNLOADER_MIDDLEWARES = {
    'myproject.middleware.ProxyMeshMiddleware': 100,
}

But then I presume I also need to add some actual middleware code, presumably in a middleware.py file?

I found this gist, so I guess I could just copy and paste that into middleware.py, but I'm not sure whether it's accurate. It seems to use different environment variables from what's recommended in the proxymesh documentation.


Solution

  • Gist that you are referring to reads ProxyMesh settings from OS environment variables, otherwise it is slightly modified code of basic Scrapy HttpProxyMiddleware and should works well.

    You can also look at my very simple implementation of Proxymesh Middleware https://github.com/mizhgun/scrapy-proxymesh that supports proxy rotation (if you have ProxyMesh plan with multiple endpoints) and customisable timeout.