Search code examples
pythonhtmlpython-3.xweb-crawlerpython-requests-html

get parents element of a tag using python requests-HTML


hi is There any way to get all The parent elements of a Tag using requests-HTML?

for example:

<!DOCTYPE html>
<html lang="en">
<body id="two">
    <h1 class="text-primary">hello there</h1>
    <p>one two tree<b>four</b>five</p>
</body>
</html> 

I want to get all parent of b tag: [html, body, p]

or for the h1 tag get this result: [html, body]


Solution

  • With the excellent lxml :

    from lxml import etree
    html = """<!DOCTYPE html>
    <html lang="en">
    <body id="two">
        <h1 class="text-primary">hello there</h1>
        <p>one two tree<b>four</b>five</p>
    </body>
    </html> """
    tree = etree.HTML(html)
    # We search the first <b> element
    b_elt = tree.xpath('//b')[0]
    print(b_elt.text)
    # -> "four"
    # Walking around ancestors of this <b> element
    ancestors_tags = [elt.tag for elt in b_elt.iterancestors()]
    print(ancestors_tags)
    # -> [p, body, html]