Search code examples
pythonweb-scrapingscrapy

My scrpay text results keep returning " \n \n"


I try to scrape some search results from

https://www.companiesintheuk.co.uk/Company/Find?q=a

With the commands

response.css('div.search_result_title').extract()

Which works, but as I try to remove the html tags with

response.css('div.search_result_title::text').extract()

But I keep getting, \n\n\n\n\n\n\n

[u'\n', u'\n(Dissolved)\n', u'\n', u'\n', u'\n', u'\n', u'\n', u'\n', u'\n', u'\n', u'\n', u'\n', u'\n', u'\n', u'\n', u'\n', u'\n', u'\n', u'\n', u'\n', u'\n', u'\n', u'\n', u'\n', u'\n', u'\n', u'\n', u'\n', u'\n', u'\n']

Do you guys know why? Thanks!


Solution

  • Do you want to get the headers' texts? You have a inside div, so yes, you get a lot of empty data. Use div.search_result_title a::text.

    And for second question about get whole block's text:

    for i in response.css('div.searchResult'): 
        print ' '.join([j.strip() for j in i.css('::text').extract() if j.strip()])