i need some enlightenment about SgmlLinkExtractor in scrapy.
For the link: example.com/YYYY/MM/DD/title i would write:
Rule(SgmlLinkExtractor(allow=[r'\d{4}/\d{2}/\d{2}/\w+']), callback='parse_example')]
For the link: example.com/news/economic/title should i write:
r'\news\category\w+'
or r'\news\w+/\w+'
? (category changes but the url contains always news)
For the link: example.com/article/title should i write:
r'\article\w+'
? (the url contains always article)
It's not possible to answer "should i" questions if you don't provide complete example strings and what you want to match (and what you don't want to match) with a regular expression.
I guess, that your regex won't work because you use \
instead of /
.
I recommend you go to regex101 and test if your urls match your regular expressions. See following screenshot: