Search code examples
python-3.xscrapyscrapy-shell

Why is scrapy printing \t\n\n where I expect there to be text?


I am a beginner with scrapy, but learning. I have been parsing this page. and am attempting to scrape the address off of the page.

I have done this in the scrapy shell, so I start by:

scrapy shell https://www.marksandspencer.com/MSStoreDetailsView?storeId=10151&langId=-24&SAPStoreId=6952

Which works fine. Then I attempt to parse the address with:

response.xpath('//li[@class="address"]/text()').extract()

But my output is the following:

['\n\t\t', '\n\t\t\n\t\t']

Why am I not able to see the address as it appears on the page:

BELFAST ABBEY CENTRE, 1 Old Glenmount Road Newtonabbey, Newton Abbey, BT36 7DN

How would I go about getting this address out? I appreciate anyone that takes the time to reply.


Solution

  • There is a couple a errors on how you are approaching this issue:

    1. When using scrapy shell, you have to surround the url with "", because the terminal could interpret it as several processes because of the character & inside the url:

      scrapy shell "https://www.marksandspencer.com/MSStoreDetailsView?storeId=10151&langId=-24&SAPStoreId=6952"
      
    2. Your xpath is not correct because with /text() you are getting the text of that particular tag, and that li doesn't actually contain the information you want. The tag that includes that text is on the children of that li so you could use:

      response.xpath('//li[@class="address"]//text()').extract()
      

      or

      response.xpath('//li[@class="address"]/p/text()').extract()