Search code examples
influxdbvarnish

How to cache based on size in Varnish?


I've been trying to cache based on response size of varnish.
Other answers suggested using Content-Length to decide whether or not to cache but I'm using InfluxDB (Varnish reverse proxies to this) and it responds with a Transfer-Encoding:Chunked which omits the Content-Length header and I am not able to figure out the size of the response.
Is there any way I could access response body size and make decision in vcl_backend_response?


Solution

  • Cache miss: chunked transfer encoding

    When Varnish processes incoming chunks from the origin, it has no idea ahead of time how much data will be received. Varnish streams the data through to the client and stores the data byte per byte.

    Once the 0\r\n\r\n is received to mark the end of the stream, Varnish will finalize the object storage and calculate the total amount of bytes.

    Cache hit: content length

    The next time the object is requested, Varnish no longer needs to use Chunked Transfer Encoding, because it has the full object in cache and knows the size. At that point a Content-Length header is part of the response, but this header is not accessible in VCL because it seems to be generated after sub vcl_deliver {} is executed.

    Remove objects after the fact

    It is possible to remove objects after the fact by monitoring their size through VSL.

    The following command will look at the backend request accounting field of the VSL output and check the total size. If the size is greater than 5MB, it generates output

    varnishlog -g request -i berequrl -q "BereqAcct[5] > 5242880"
    

    Here's some potential output:

    *   << Request  >> 98330
    **  << BeReq    >> 98331
    --  BereqURL       /
    

    At that point, you know that the / resource is bigger than 5 MB. You can then attempt to remove it from the cache using the following command:

    varnishadm ban "obj.http.x-url == / && obj.http.x-host == domain.com"
    

    Replace domain.com with the actual hostname of your service and set / to the URL of the actual endpoint you're trying to remove from the cache.

    Don't forget to add the following code to your VCL file to ensure that the x-url and x-host headers are available:

    sub vcl_backend_response {
        set beresp.http.x-url = bereq.url;
        set beresp.http.x-host = bereq.http.host;
    }
    
    sub vcl_deliver {
        unset resp.http.x-url;
        unset resp.http.x-host;
    }
    

    Conclusion

    Although there's no turn-key solution to access the size of the body in VCL, but the hacky solution I suggested where we remove objects after the fact is the only thing I can think of.