Search code examples
pythonpython-3.xhttpurllib

How to check HTTP status of a file online without fully downloading the file?


I have a database of thousands of files online, and I want to check what their status is (e.g. if the file exists, if it sends us to a 404, etc.) and update this in my database.

I've used urllib.request to download files to a python script. However, obviously downloading terabytes of files is going to take a long time. Parallelizing the process would help, but ultimately I just don't want to download all the data, just check the status. Is there an ideal way to check (using urllib or another package) the HTTP response code of a certain URL?

Additionally, if I can get the file size from the server (which would be in the HTTP response), then I can also update this in my database.


Solution

  • If your web server is standards-based, you can use a HEAD request instead of a GET. It returns the same status without actually fetching the page.