Search code examples
pythonweb-scrapingpython-3.6python-requests-html

get renderd javascript lines from website in python


I'm using python 3.6.6 for this.

I'm trying to get the current versionnumber of pycharm from the pycharm website (https://www.jetbrains.com/pycharm/download/#section=windows). The versionnumber is displayed pretty obvious, still I can't get it because I don't know how to process java script properly.

I tried parsing it out with requests_html from:

<li>Version: <span data-code="PCP" data-release-version=""></span></li>

This part should look like this after java script has done its job:

<li>Version: <span data-code="PCP" data-release-version="">2018.1.4</span></li>

Here is my not working script by the way:

from requests_html import HTMLSession

session = HTMLSession()
r = session.get('https://www.jetbrains.com/pycharm/download/#section=windows')


r.html.render()
item = r.html.find('<span data-code="PCP" data-release-version=""></span>')


print(item)

I don't care if there would be any parts left over, I would simply filter them out with RegEx. Still the only thing I'm getting from this is:

[<Element 'span' data-code='PCP' data-release-version=''>]

Solution

  • update:

    I found an solution my self. It seems like render() is in need for sleep. Also I used xpath instead of search.

    from requests_html import HTMLSession
    
    session = HTMLSession()
    r = session.get('https://www.jetbrains.com/pycharm/download/#section=windows')
    
    
    r.html.render(sleep=0.1)
    item = r.html.xpath('/html/body/div[1]/div[2]/div/div[2]/div[1]/div[2]/ul[1]/li[1]/span/text()')
    
    print('------------------------------------------------')
    print(item)
    

    my Result:

    ['2018.1.4']