Search code examples
pythonhttphttplib

Why is the block size for Python httplib's reads hard coded as 8192 bytes


I'm looking to make a fast streaming download -> upload to move large files via HTTP from one server to another.

During this, I've noticed that httplib, that is used by urllib3 and therefore also requests, seems to hard code how much it fetches from a stream at a time to 8192 bytes

https://github.com/python/cpython/blob/28453feaa8d88bbcbf6d834b1d5ca396d17265f2/Lib/http/client.py#L970

Why is this? What is the benefit of 8192 over other sizes?


Solution

  • Nginx webserver

    This is from nginx

    Syntax: client_body_buffer_size size;
    
    Default:    client_body_buffer_size 8k|16k;
    

    Sets buffer size for reading client request body. In case the request body is larger than the buffer, the whole body or only its part is written to a temporary file. By default, buffer size is equal to two memory pages. This is 8K on x86, other 32-bit platforms, and x86-64. It is usually 16K on other 64-bit platforms

    Apache WebServer

    ProxyIOBufferSize Directive
    Description:    Determine size of internal data throughput buffer
    Syntax: ProxyIOBufferSize bytes
    Default:    ProxyIOBufferSize 8192
    Context:    server config, virtual host
    Status: Extension
    Module: mod_proxy
    

    So Apache also uses 8192 by default as the proxy buffer size.

    Apache Client

    The apache Java client documentation indicates

    https://hc.apache.org/httpcomponents-client-4.2.x/tutorial/html/connmgmt.html

    • CoreConnectionPNames.SOCKET_BUFFER_SIZE='http.socket.buffer-size': determines the size of the internal socket buffer used to buffer data while receiving / transmitting HTTP messages. This parameter expects a value of type java.lang.Integer. If this parameter is not set, HttpClient will allocate 8192 byte socket buffers.

    Ruby Client

    In ruby the value is set by default 16K

    https://github.com/ruby/ruby/blob/814daf855e0aa2c3a1164dc765378d3a092a1825/lib/net/protocol.rb#L172

    Then there are below thread

    What is a good buffer size for socket programming?

    What is the best memory buffer size to allocate to download a file from Internet?

    Optimum file buffer read size?

    If you look at many of this the consensus lies on 8K/16K as the buffer size. And it is not that it should be fixed to that but configurable and 8k/16K should be good enough for most situations. So I don't see a problem with Python also using that 8K by default. But yes it should have been configurable

    Python 3.7 will have it configurable as such but then that may not help your cause if you can't upgrade to the same.