The below code got stuck after printing hi in output. Can you please check what is wrong with this? And if the site is secure and I need some special authentication?
from bs4 import BeautifulSoup
import requests
print('hi')
rooturl='http://www.hoovers.com/company-information/company-search.html'
r=requests.get(rooturl);
print('hi1')
soup=BeautifulSoup(r.content,"html.parser");
print('hi2')
print(soup)
Unable to read html page from beautiful soup
Why you got this problem is website consider that you are robots, they won't send anything to you. And they even hang up the connection let you wait forever.
You just imitate browser's request, then server will consider you are not an robot.
Add headers is the simplest way to deal with this problem. But something you should not pass User-Agent
only(like this time). Remember copy your browser's request and remove the useless element(s) through testing. If you are lazy use browser's headers straightly, but you must not copy all of them when you want to upload files
from bs4 import BeautifulSoup
import requests
rooturl='http://www.hoovers.com/company-information/company-search.html'
with requests.Session() as se:
se.headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36",
"Accept-Encoding": "gzip, deflate",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"Accept-Language": "en"
}
resp = se.get(rooturl)
print(resp.content)
soup = BeautifulSoup(resp.content,"html.parser")