I hope you are doing well.
<ul>
<li>
<s>Title:</s>
De Aardappeleters
</li>
<li>
<s>Dimensions:</s>
82 x 114 cm
</li>
<li>
<s>Media:</s>
canvas
</li>
<li>
<s>Style:</s>
Realism
</li>
<li>
<s>Date:</s>
1885
</li> ______
<li> |
<s>Genre:</s> | It is located on a page of the website here
Modern |
</li> ______|
</ul>
I have an HTML block☝ that I want to receive a text from li. But unfortunately, this li has no class or ID that I can select.This block is for a site.
<li>
<s>Genre:</s>
Modern
</li>
I want to select the genre list and get the text.👇
Modern
The main problem here is that this block is different on another page.👇
<ul>
<li>
<s>Title:</s>
De Aardappeleters
</li>
<li>
<s>Dimensions:</s>
82 x 114 cm
</li>
<li>
<s>Media:</s>
canvas
</li> ______
<li> |
<s>Genre:</s> |And it is located here on another page.
Modern |
</li> ______|
<li>
<s>Style:</s>
Realism
</li>
<li>
<s>Date:</s>
1885
</li>
</ul>
OriginalTagFind = layout.css('article ul li s::text').getall()
TitleOriginal = [tag.strip() for tag in OriginalTagFind if tag.startswith('Genre:')]
In my opinion, if I come to the place I have selected and print the text of the mother's list with Next Sibiling. is it possible؟
With a css selector you can use:
'li:has(s):contains("Genre:")::text'
With an xpath selector you can use:
"//li[s[contains(text(), 'Genre')]]/text()"
I have demonstrated using both with your example below:
In [1]: html = """<ul>
...: <li>
...: <s>Title:</s>
...: De Aardappeleters
...: </li>
...: <li>
...: <s>Dimensions:</s>
...: 82 x 114 cm
...: </li>
...: <li>
...: <s>Media:</s>
...: canvas
...: </li>
...: <li>
...: <s>Style:</s>
...: Realism
...: </li>
...: <li>
...: <s>Date:</s>
...: 188
...: </li>
...: <li>
...: <s>Genre:</s>
...: Modern
...: </li>
...: </ul> """
In [2]: selector = scrapy.Selector(text=html)
In [3]: ''.join(selector.xpath("//li[s[contains(text(), 'Genre')]]/text()").getall()).strip()
Out[3]: 'Modern'
In [4]: ''.join(selector.css('li:has(s):contains("Genre:")::text').getall()).strip()
Out[4]: 'Modern'