Search code examples
python-3.xweb-scrapingscrapysplash-screen

Changing Scrapy/Splash user agent


How can I set the user agent for Scrapy with Splash in an equivalent way like below:

import requests
from bs4 import BeautifulSoup

ua = {"User-Agent":"Mozilla/5.0"}
url = "http://www.example.com"
page = requests.get(url, headers=ua)
soup = BeautifulSoup(page.text, "lxml")

My spider would look similar to this:

import scrapy
from scrapy_splash import SplashRequest


class ExampleSpider(scrapy.Spider):
        name = "example"
        allowed_domains = ["example.com"]
        start_urls = ["https://www.example.com/"]

        def start_requests(self):
            for url in self.start_urls:
                yield SplashRequest(
                    url,
                    self.parse,
                    args={'wait': 0.5}
                )

Solution

  • You need to set user_agent attribute to override default user agent:

    class ExampleSpider(scrapy.Spider):
        name = 'example'
        user_agent = 'Mozilla/5.0'
    

    In this case UserAgentMiddleware (which is enabled by default) will override USER_AGENT setting value to 'Mozilla/5.0'.

    You can also override headers per request:

    scrapy_splash.SplashRequest(url, headers={'User-Agent': custom_user_agent})