I'm making a crawler with Scrapy and wondering why my xpath doesn't work when my CSS selector does? I want to get the number of commits from this html:
<li class="commits">
<a data-pjax="" href="/samthomson/flot/commits/master">
<span class="octicon octicon-history"></span>
<span class="num text-emphasized">
521
</span>
commits
</a>
</li
Xpath:
response.xpath('//li[@class="commits"]//a//span[@class="text-emphasized"]//text()').extract()
CSS:
response.css('li.commits a span.text-emphasized').css('::text').extract()
CSS returns the number (unescaped), but XPath returns nothing. Am I using the // for nested elements correctly?
You're not matching all values in the class
attribute of the span
tag, so use the contains
function to check if only text-emphasized
is present:
response.xpath('//li[@class="commits"]//a//span[contains(@class, "text-emphasized")]//text()')[0].strip()
Otherwise also include num
:
response.xpath('//li[@class="commits"]//a//span[@class="num text-emphasized"]//text()')[0].strip()
Also, I use [0]
to retrieve the first element returned by XPath and strip()
to remove all whitespace, resulting in just the number.