Search code examples
xpathweb-crawlerscreen-scraping

Xpath for when positioning is never fixed


I've been using a specific Xpath for scraping data using a crawler. The issue is that the Xpath is looking for an "li" tag in a specific oder/position.

When other "li" tags appear before the one I'm targeting then it ruins the ordering and it returns the incorrect value.

Is there a way to setup an Xpath differently to avoid this?

The Xpath I've been using is:

//div[@class="info"]/div[1]/div[1]/div[1]/div[1]/ul[1]/li[2] 

on the below.

You'll see that the last "[2]" is always looking for the second tag to return the "thickness" value. Sometimes an additional "li" tag will appear above and it'll return an incorrect value.

    <div class="info">
    <div class="accordion additional-info  info__block" >
    <h5 class="accordion__header additional-info__header info__header">
    Product details</h5>
    <div class="accordion__content additional-info__content info__content" >
    <div class="info-content" id="product-details">
    <div class="info__lists">
    <ul class="info__lists-not-bullets">
    <li><strong>Length:</strong>
    3600 mm</li>
    <li><strong>Thickness:</strong>
    45 mm</li>
    <li><strong>Width:</strong>
    95 mm</li>
    <li><strong>Thickness Imperial:</strong>
    1 3/4in</li>
    <li><strong>Width Imperial:</strong>
3 3/4in</li>
</ul>

Solution

  • You could test the value of the child strong element in a predicate to see if it's Thickness:...

    //div[@class='info']//ul[@class='info__lists-not-bullets']/li[normalize-space(strong)='Thickness:']