Search code examples
javascriptpythonhtmlcsspython-requests-html

Not all html can be accessed in pythion-requests-html


I am trying to run a script to simply find a few numbers in a website however it doesn't seem to want to let me past a certain point. In this script :

from requests_html import HTMLSession
import requests

url = "https://auction.chimpers.xyz/"
try:
    s = HTMLSession()
    r = s.get(url)
except requests.exceptions.RequestException as e:
    print(e)

r.html.render(sleep=1)

title = r.html.find("title",first=True).text
print(title)

divs_found = r.html.find("div")
print(divs_found)

meta_desc = r.html.xpath('//*[@id="description-view"]/div',first=True)
print(meta_desc)

price = r.html.find(".m-complete-info div",first=True)
print(price)

The result of this gives :

Chimpers Genesis 100  
[<Element 'div' id='app'>, <Element 'div' data-v-1d311e85='' id='m-connection' class=('manifold',)>, <Element 'div' id='description-view'>, <Element 'div' class=('manifold', 'm-complete-view')>, <Element 'div' data-v-cf8dbfe2='' class=('manifold', 'loading-screen')>, <Element 'div' class=('manifold-logo',)>]
<Element 'div' class=('manifold', 'm-complete-view')>  
None  
[Finished in 3.9s]

website: https://auction.chimpers.xyz/

and the information I am trying to find is here

Clearly there is more HTML elements past the ones in printed out in the list, however every time I try and access them even using r.html.xpath("//*[@id="description-view"]/div/div[2]/div/div[2]/span/span[1]") it will return None even though it is the copied xpath that i have got via the inspect in google

Any reason why this is and how I would go about it?


Solution

  • I don't actually if it's even possible to do with requests_html, but it is with selenium.

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.chrome.options import Options
    
    url = "https://auction.chimpers.xyz/"
    class_names = ["m-price-label", "m-price-data"]
    
    driver_options = Options()
    driver_options.add_argument("--headless")
    driver = webdriver.Chrome(options=driver_options)
    driver.get(url)
    
    results = {}
    
    try:
        for class_name in class_names:
            element = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.CLASS_NAME, class_name)))
            # Getting inner text of the html tag
            results[class_name] = element.get_attribute("textContent")
    finally:
        driver.quit()
    
    print(results)
    

    Feel free to use another webdriver than Chrome