Search code examples
pythonpython-2.7wxpythonweb-scrapingscrapy

Setting Scrapy start_urls from a Script


I have a working scrapy spider and I'm able to run it through a separate script following the example here. I have also created a wxPython GUI for my script that simply contains a multi-line TextCtrl for users to input a list of URLs to scrape and a button to submit. Currently the start_urls are hardcoded into my spider - How can I pass the URLs entered in my TextCtrl to the start_urls array in my spider? Thanks in advance for the help!


Solution

  • Just set start_urls on your Spider instance:

    spider = FollowAllSpider(domain=domain)
    spider.start_urls = ['http://google.com']