I am trying to scrap a site but while running the script, I'm getting following error
'NotSupported: Unsupported URL scheme '': no handler available for that scheme'
If the rule is not wrong, why does it occur and what's your suggestion, please help me. Thanks a lot.
code is here:
from scrapy.spiders import CrawlSpider, Rule, BaseSpider
from scrapy.linkextractors import LinkExtractor
class FellowSearch(CrawlSpider):
name ='fellow'
allowed_domains = ['emma.cam.ac.uk']
start_urls = [' https://www.emma.cam.ac.uk/']
rules =(Rule(LinkExtractor(allow=(r'\?id=\d+$')),callback='parse_obj', follow=True),)
def parse_obj(self, response):
print response.url
You need to remove space before https in your start_urls
change to start_urls =
['https://www.emma.cam.ac.uk/']
.