Search code examples
resthttp2

Should I use an HTTP/2 specific feature for a REST API that transforms a large file


The Situation

My team is creating an API that receives a large structured text file (100MB - 1TB, 1GB expected) and modifies each row and returns the resulting file. We can process the file as fast as it is transmitted, so would like to avoid caching the file on our servers. We favor ease-of-use for our clients over our own resource use, so this is not a hard requirement.

Some Options

HTTP/1.1 implicitly requires that the full request be processed before the response is sent (except in the case of errors) and bad things can happen, especially with proxies, if you try to get around this. So we were going to bite the bullet and store the request or response and use another resource in our organization for uploading large files for processing.

HTTP/2 explicitly allows you to send before the request has finished and requires that the client read what you send and HTTP/2 is already supported in all major browsers.

So, I see a few potential apis (all POST):

HTTP1.x: upload/download - there is already some infrastructure for this

/transformed_file_id/ --> returns id for the uploaded file 
/transformed_file/{id} --> returns the transformed data

HTTP1.x: single request

/transformed_file/ --> returns the transformed version of the file - stores stuff under-the-hood

HTTP2: single request

/transformed_file/ --> returns the transformed version of the file - starts sending response as soon as it receives the first couple of K.

The Question(s)

Though I wouldn't shy away from it for browser content, is it wise to use HTTP/2 for a service in order to access this feature?

Or is all this a bad idea and clients should be forced to upload smaller parts of the file at a time (and we'll need to write a front-end to allow this on a browser-interface - which could be quite tough).


Solution

  • My experience with various clients, servers and proxies is that it is not true that HTTP/1.1 requires that the full request to be sent before an application can start responding. It happens all the times.

    On the other hand, if your clients have to upload 100 MiB - 1 TiB of data (!) in a single request, I would setup some mechanism to recover upload failures, similar to the range headers for downloads. See also: Standard method for HTTP partial upload, resume upload

    Having said that, with HTTP/2 and big uploads you have to pay special attention to the client's flow control send window. This is by default 64 KiB, which means that the client can only send at most 64 KiB before waiting for the server to acknowledge that content. The acknowledgement must travel from server to client, so the network latency plays an important role here: the client may be really fast at writing the 64 KiB, but then wait most of the time for the server acknowledgement. This could cause terrible slowdowns of uploads.

    Just to give you an idea, browsers (Firefox) modify their receive window to be able to perform fast downloads from servers from 64 KiB to 12 MiB (almost 200x). Unfortunately they don't do the same for uploads.

    You don't specify if your clients are browsers or not; if not, make sure you can have control over the configuration of the flow control window, both send and receive, and enlarge them enough to not be slowed down by flow control acknowledgements.