I am trying to get links and category from this http://www.npr.org/rss/#feeds news feed website.
This is my xpath in scrapy shell:
a = sel.xpath('//ul[@class="rsslinks"]/li/a/@href').extract()
b = sel.xpath('//ul[@class="rsslinks"]/li/a/text()').extract()
But length of b is one lesser than length of a. I don't know what I am missing here. But this is causing problems in data.
From the image below,the category name is "Most Emailed Stories" but link is for "News Headlines"
Any help would be appreciated
This is because of the first link in the results:
<a class="iconlink xml" href="/rss/rss.php?id=1001" target="blank"><strong>News Headlines</strong></a>
As you can see, there is no direct child "text" nodes, only one strong
element. Your xpath would not match it.
Add an another slash to get all text nodes from the a
tag:
//ul[@class="rsslinks"]/li/a//text()
HERE^