Search code examples
httpposthttp-headersurl-encoding

What to chose application/x-www-form-urlencoded / multipart/form-data for file size in GB?


I am sending some video files (size could be even in GB) as application/x-www-form-urlencodedover HTTP POST.

The following link link suggests that it would be better to transmit it over Multipart form data when we have non-alphanumeric content.

  1. Which encoding would be better to transmit data of this kind?

  2. Also how can I find the length of encoded data (data encoded with application/x-www-form-urlencoded)?

  3. Will encoding the binary data consume much time?

  4. In general, encoding skips the non-alphanumeric characters with some others. So, can we skip encoding for binary data (like video)? How can we skip it?


Solution

  • x-www-form-urlencoded treats the value of an entry in the form data set as a sequence of bytes (octets).
    Of the possible 256 values, only 66 are left as it or still encoded as a single byte value, the others are replaced by the hexadecimal representation of the value of their code-point. This usually takes three to five bytes depending on the encoding.
    So in average (256-66)/256 or 74% of the file will be encoded to take three-to-five as much space as originally. This encoding however has no header nor significant overhead.

    multipart/form-data instead works by dividing the data into parts and then finding a string of any length that doesn't occur in said part.
    Such string is called the boundary and it is used to delimit the end of the part, that is transmitted as a stream of octects.
    So the file is mostly send as it, with negligible size overhead for big enough data.

    The draw back is that the user-agent need to find a suitable boundary, however given a string of length k there is only a probability of 2-8k of finding that string in a uniformly generated binary file.
    So the user-agent can simply generate a random string and do a quick search and exploit the network transmission time to hide the latency of the search.


    1. You should use multipart/form-data.
    2. This depends on the platform you are using, in general if you cannot access the request body you have to re-perform the encoding your self.
    3. For multipart/form-data encoding there is a little, usually negligible (compared to the transmission time) overhead.