I'm trying to extract the text from this html structure:
<div class="col-6 col-lg-3">
<span class="font-weight-bold">List of Birds</span>
<ul class="bird-forms">
<li>Crow <span class="color">Black</span></li>
<li>Peacock <span class="color">Multicolored</span></li>
<li>Dove <span class="color">Multicolored</span></li>
<li>Sparrow <span class="color">Brown</span></li>
<li>Goose <span class="color">Multicolored</span></li>
<li>Ostrich <span class="color">Multicolored</span></li>
</ul>
</div>
Using scrapy shell: response.css('ul.bird-forms li ::text').extract()
I want to the result to look like this:
['Crow Black',
'Peacock Multicolored',
'Dove Multicolored',
'Sparrow Brown',
'Goose Multicolored',
'Ostrich Multicolored']
Instead of this:
['Crow',
'Black',
'Peacock',
'Multicolored',
'Dove',
'Multicolored',
'Sparrow',
'Brown',
'Goose',
'Multicolored',
'Ostrich',
'Multicolored']
Simply use XPath string()
:
birds = []
for li in response.xpath('//ul[@class="bird-forms"]/li'):
bird = li.xpath('string(.)').get()
birds.append(bird)