Search code examples
pythonxpathpython-requestslxmllxml.html

python lxml xpath get the nodes attributes with specific string pattern


Im learning xpath and trying to get the value of node with specific node attribute for example(google playstore) using python lxml/html. From below code I wanted to get the developer email value from node "a" with attribute "href" starting with "mailto:". My python code snippet returns app name but empty developer email. Thank you

<html>
<div class="id-app-title" tabindex="0">Candy Crush Saga</div>
<div class="meta-info meta-info-wide"> 
<div class="title"> Developer </div> 
<a class="dev-link" href="https://www.google.com/url?q=http://candycrush.com" rel="nofollow" target="_blank"> Visit website </a>
<a class="dev-link" href="mailto:[email protected]"
rel="nofollow" target="_blank">[email protected] </a> ##Interesting part here
</div>
</html>

Python code (2.7)

 def get_app_from_link(self,link):
    start_page=requests.get(link)
    #print start_page.text
    tree = html.fromstring(start_page.text)
    name = tree.xpath('//div[@class="id-app-title"]/text()')[0]
    #developer=tree.xpath('//div[@class="dev-link"]//*/div/@href')
    developer=tree.xpath('//div[contains(@href,"mailto") and @class="dev-link"]/text()')
    print name,developer
    return 

Solution

  • Now you are using tag div, not a:

    '//a[contains(@href,"mailto") and @class="dev-link"]/text()'

    Also, your function don't return items. Use return like:

    def get_app_from_link(self,link)::
        # your code
        return name, developer