I am struggling with the ImportXML
function from Google Spreadsheet. A specific webpage has this content:
<li class="myTag">
<a href="/xyz/abc/">
Jouets
<small class="hidden">
<span>235 6453 sous-section</span>
<span>1231 765 produits</span>
<span>12155 654 images</span>
</small>
</a>
</li>
<li class="myTag">
<a href="/xyz/def/">
Livres
<small class="hidden">
<span>235 6453 sous-section</span>
<span>1231 765 produits</span>
<span>12155 654 images</span>
</small>
</a>
</li>
I want to extract all li
tags with the class="myTag"
but without the <small>
section, in order to have this list:
I have written this XPath expression:
=IMPORTXML(B1; '//li[@class='myTag'].ReplaceAll("<[^>]*>", "", "")')
But I have the #N/A
error.
How can I extract just the first elements (Jouets, Livres) without the <small> section?
Originality of this question
Scraping data to Google Sheets from a website that uses JavaScript is about dynamic web pages which is not my case. My page is already generated in HTML content - I can already extract the full content within the <li>
tag with the parameter '//li[@class='myTag']'
.
My point is to exclude the <small>
content which is glued.
I am looking to get this list:
and not this one:
From your following reply,
Currently, with this formula I have this result (just pasting the first lines): -
Arts1 069 sous-thèmes15 778 articles11 949 médias
; -Astronomie et astrophysique107 sous-thèmes570 articles696 médias
; -Chimie147 sous-thèmes1 042 articles1 087 médias
But I am attempting to get only: -Arts
; -Astronomie et astrophysique
; -Chimie
How about the following XPath?
//li[@class='with-counter']/a/text()
=IMPORTXML("https://www.universalis.fr/classification/"; "//li[@class='with-counter']/a/text()")