Search code examples
xmlweb-scrapingxpathscrapycontains

Search for specific text in XML tree and extract text in next node


Trying to scrape the weight of smartwatches from www.currys.co.uk. The website does not follow the same structure for all products so to get the weight of each product I am trying to use a keyword search using xpath:

//text()[contains(.,'Weight')]

I can get the text "Weight", but what i want to get is the following node that contains the actual value of the weight:

<tbody>
 <tr>
   <th scope = "row">Weight</th>
   <td> 26.7 g</td>
 <tr>
<body>

What I am looking for is to get the text 26.7 g. I tried using the below, but it doesn't seem to work:

//text()[contains(.,'Weight')]//td

Any suggestions? Thanks in advance.


Solution

  • You can use following-sibling::td:

    from lxml import etree
    
    
    txt = '''<tbody>
     <tr>
       <th scope = "row">Weight</th>
       <td> 26.7 g</td>
     </tr>
    </tbody>'''
    
    root = etree.fromstring(txt)
    
    for td in root.xpath('//th[contains(., "Weight")]/following-sibling::td'):
        print(td.text)
    

    Prints:

     26.7 g