I am trying to find out the last chapter number of a story at www.fanfiction.net just for fun. For this I thought that since it has a fixed pattern of url I will just increment the chapter number till the time that it gives me a url which does not exist.
To find whether the url existed I tried out the script at this stackoverflow ques
However i found out that it does not give a response error of > 400 and rather gives a message along with 200 response. What would be the best way to identify that the page exists or not.
Here is a link that actually exists exists and here is one that does not exist does not exist
How can i do so ?
Thanks to GregSchoen I worked it out. I hope it is correct though :)
I checked out the values for resp.getheader("last-modified", None) and it gives some date for active links and None for those which are not.
Thanks a lot
If you do a HEAD request on the URLs you supplied, Last-Modified is set on valid pages but not on invalid pages. This would be an easy way to key on valid pages, since their server is not responding with a proper HTTP code.