python loops python-requests http-status-code-404

Python: Avoiding downloading html when taken to "page doesn't exist" error

I am teaching myself webscraping and wanted to download a bunch of .pgn files (essentially text files), using requests. Filenames are in the form of dates but are not strictly chronological. I ran a loop over possible dates, but if an indexed date doesn't correspond to a file, I still end up downloading the filename.pgn as a text file with the html of the error page. Instead, what I want is for these dates to be skipped.

Here's an example:

If I run:

filename = 'games9jul18.pgn'
url = 'https://www.chesspublishing.com/p/9/jul18/'+filename
response = requests.post(url, data=payload)
with open(filename, 'wb') as e:
    e.write(response.text)

with the appropriate authentication in payload, the correct file games9jul18.pgn is saved. But if I run:

filename = 'games9aug18.pgn'
url = 'https://www.chesspublishing.com/p/9/aug18/'+filename
response = requests.post(url, data=payload)
with open(filename, 'wb') as e:
    e.write(response.text)

I still get a saved file games9aug18.pgn, but instead of being a 'real' pgn file, it's a text file of the html of the error page. Navigating to the error page on my browser, it has no error code but a big chunk of text The page you've asked may have been removed, or perhaps never existed.

Unfortunately, it's not possible to loop only over the filenames corresponding to actual files, due to the inconsistent date structure. How can I add a condition to not create .pgn files if the error page is reached?

Solution

You should check the response status. "Page not found" is 404, so you could check for that code or even check for a successful request, which is 200:

response = requests.post(url, data=payload)
if response.status == 200:
    with...