Search code examples
httphashhttpschecksum

Checksum in HTTP response header - why not?


I'd like to know some kind of file checksum (like SHA-256 hash, or anything else) when I start downloading file from HTTP server. It could be transferred as one of HTTP response headers.

HTTP etag is something similar, but it's used only for invalidating browser cache and, from what I've noticed, every site is calculating it in different way and it doesn't look like any hash I know.

Some software download sites provide various file checksums as separate files to download (for example, latest Ubuntu 16.04 SHA1 hashes: http://releases.ubuntu.com/16.04/SHA1SUMS). Won't it be easier to just include them in HTTP response header and force browser to calculate it when download ends (and do not force user to do it manually).

I guess that whole HTTP-based Internet is working, because we're using TCP protocol, which is reliable and ensures received bytes are exactly same as one send by the server. But if TCP is so "cool", why do we check file hashes manually (see abouve Ubuntu example)? And lot of thing can go wrong during file download (client/server disk corruption, file modification on server side etc.). And if I'm right, everything could be fixed simply by passing file hash at download start.


Solution

  • The checksum provided separately from the file is used for integrity check when doing Non TLS or indirect transfer.

    Maybe I know your doubt because I had the same question about the checksums, let's find it out.

    There are two tasks to be considered:

    1. File broken during transfer
    2. File be changed by hacker

    And three protocol in this question:

    1. HTTP protocol
    2. SSL/TLS protocol
    3. TCP protocol

    Now we separate into two situations:

    1. File provider and client transfer the file directly, no proxy, no offline(usb disk).

    The TCP protocol promise: the data from server is exactly same as the data client received, by checksum and ack.

    The TLS protocol promise: the server is authenticated (is truly ubuntu.com) and the data is not changed by any middleman.

    So there is no need to add checksum header in HTTP protocol when doing HTTPS.

    But when TLS is not enabled, forgery could happen: bad guy in middle gives a bad file to the client.

    2. File provider and client transfer the file indirectly, by CDN, by mirror, by offline way(usb disk).

    Many sites like ubuntu.com use 3-party CDN to serve static files, which the CDN server is not managed by ubuntu.com. http://releases.ubuntu.com/somefile.iso redirect to http://59.80.44.45/somefile.iso.

    Now the checksum must be provided out-of-band because it is not authenticated we don't trust the connection. So checksum header in HTTP protocol is helpless in this situation.