Search code examples
pythonbeautifulsouphttpx

Python parsing the site gives <html></html>


There is a website that I need to analyze However, when I try to analyze it, I get the response <html></html>

Tried to change the useragent, cookie, doesn't help.

from bs4 import BeautifulSoup
import httpx

response = httpx.get('https://lolz.guru/market/')
soup = BeautifulSoup(response.text, 'lxml')

print(response.text)

Solution

  • You can also use request_html, it has the ability to render JavaScript:

    from bs4 import BeautifulSoup
    from requests_html import HTMLSession
    
    
    session = HTMLSession()
    resp = session.get('https://lolz.guru/market/')
    
    resp.html.render(sleep=1, keep_page=True)
    soup = BeautifulSoup(resp.html.html, "lxml")
    
    print(soup.text)
    # print the whole page
    

    You can install it using pip: pip install requests-html