My problem is quite simple, I want to get a response from the following website: http://www.pulsant.com
I simply want to check if there is a redirect or not and to do this I am using the following code:
import urllib.request as Request
import urllib.parse
url = 'http://www.pulsant.com'
user_agent = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)'
values = {'name': 'Michael Foord',
'location': 'Northampton',
'language': 'Python' }
headers = {'User-Agent': user_agent}
data = urllib.parse.urlencode(values).encode("utf-8")
debug_requests_on()
req = Request.Request(url, data, headers)
response = Request.urlopen(req)
the_page = response.read()
This code works for a lot of websites, however there are the occasional few that it just wont work for I get this response:
HTTPError: HTTP Error 503: Service Temporarily Unavailable
This website is definitely online and working, however the response it gives me is not what I would expect.
I have tried spoofing my user agent and different methods of sending a request, however I just can not for the life of me figure out how to get a response from this website.
I tried to use requests
module to get a response, and it worked. If this isn't working for you, the website might have some GeoIP blocking or other mechanism in order:
import requests
from bs4 import BeautifulSoup
r = requests.get('http://www.pulsant.com')
soup = BeautifulSoup(r.text, 'lxml')
print(soup.title.text)
print(soup.h1.text)
Prints:
Hybrid IT & Managed Cloud Hosting Solutions | Pulsant
Experts in compliant business cloud platforms