Search code examples
xpathweb-crawlerscrapyocticons

Xpath get text of nested item not working but css does


I'm making a crawler with Scrapy and wondering why my xpath doesn't work when my CSS selector does? I want to get the number of commits from this html:

<li class="commits">
    <a data-pjax="" href="/samthomson/flot/commits/master">
        <span class="octicon octicon-history"></span>
        <span class="num text-emphasized">
          521
        </span>
        commits
    </a>
  </li

Xpath:

response.xpath('//li[@class="commits"]//a//span[@class="text-emphasized"]//text()').extract()

CSS:

response.css('li.commits a span.text-emphasized').css('::text').extract()

CSS returns the number (unescaped), but XPath returns nothing. Am I using the // for nested elements correctly?


Solution

  • You're not matching all values in the class attribute of the span tag, so use the contains function to check if only text-emphasized is present:

    response.xpath('//li[@class="commits"]//a//span[contains(@class, "text-emphasized")]//text()')[0].strip()
    

    Otherwise also include num:

    response.xpath('//li[@class="commits"]//a//span[@class="num text-emphasized"]//text()')[0].strip()
    

    Also, I use [0] to retrieve the first element returned by XPath and strip() to remove all whitespace, resulting in just the number.