Search code examples
pythonhtmlsession

Find XPath of next sibling python


Here's the outhtml of an element on a webpage

<td valign="top">
                            <script type="text/javascript">sjcap();</script><p><input type="text" id="uword" name="uword" class="" size="20"></p><p><img src="/wps/PA_1_ATAGT15208O2F02M34340U0000/./cimg/31.jpg" width="290" height="80" alt=""></p>
                            </td>

I am trying to build xpath for the image and extract the src attribute using HTMLSession requests_html Here's my xpath but this didn't match the element //input[@id='uword']/following-sibling::p I inspected the element and try to use Ctrl + F to find the xpath but I got 0 results


Solution

  • The html in your question is not well formed xml (the <input> and <img> elements aren't closed). Second, the <p> element containing the <img> child is not a sibling of the <input> tag, but of that tag's <p> parent. Assuming the html is fixed like this:

    <td valign="top">
      <script type="text/javascript">sjcap();</script>
      <p>
        <input type="text" id="uword" name="uword" class="" size="20"/>
      </p>
      <p>
        <img src="/wps/PA_1_ATAGT15208O2F02M34340U0000/./cimg/31.jpg" width="290" height="80" alt=""/>
      </p>
    </td>
    

    The following xpath

    //p[./input[@id="uword"]]/following-sibling::p/img/@src
    

    or

    //p/input[@id="uword"]/../following-sibling::p/img/@src
    

    should output

    /wps/PA_1_ATAGT15208O2F02M34340U0000/./cimg/31.jpg