I'm trying to get to a web using httplib (or urllib2, for me both are just fine).
I just want to access it to parse the HTML and look for something. However, no matters how I try to achieve it, all them end in an error from the server.
For example:
import httplib
conn = httplib.HTTPSConnection("mangapanda.onl")
conn.request("GET", "/")
response = conn.getresponse()
print response.status, response.reason
Ends with:
500 Internal Server Error
And:
import urllib2
redirect_handler= urllib2.HTTPRedirectHandler()
opener = urllib2.build_opener(redirect_handler)
r = opener.open('https://www.mangapanda.onl/')
print r.status, r.reason
Raises an exception in the open function with:
urllib2.HTTPError: HTTP Error 403: Forbidden
I've tried with several URLs within each library, removing the ending "/" from the URL and so forth, but I've not been able to achieve it yet.
Furthermore, which I really want is to understand why is this happening. The only reason I've think about is that the web should be using some kind of redirect for requests that maybe the library isn't able to follow, but then again after the last snippet I thought it should follow it.
Is it a URL syntax problem? How should I write it? Why? How can I solve this?
It probably due to the server not knowing where the request is coming from. Also, some websites don't allow requests they deem as bot activity. In order to fix that problem, you could provide fake information for the request. Check out the urllib2 request library. Also here's how to enter the "fake data", or headers.