Search code examples
http-headershtmldynamically-generatedhttp-caching

Understanding caching strategies of a dynamically generated search page


While studying the caching strategies adopted by various search engine websites and Stackoverflow itself, I can't help but notice the subtle differences in the response headers:

Google Search

Cache-Control: private, max-age=0
Expires: -1

Yahoo Search

Cache-Control: private
Connection: Keep-Alive
Keep-Alive: timeout=60, max=100

Stackoverflow Search

Cache-Control: private

There must be some logical explanation behind the settings adopted. Can someone care to explain the differences so that everyone of us can learn and benefit?


Solution

  • From RFC2616 HTTP/1.1 Header Field Definitions, 14.9.1 What is Cacheable:

    private
       Indicates that all or part of the response message is intended for a single
       user and MUST NOT be cached by a shared cache. This allows an origin server
       to state that the specified parts of the response are intended for only one
       user and are not a valid response for requests by other users. A private
       (non-shared) cache MAY cache the response.
    

    max-age=0 means that it may be cached up to 0 seconds. The value zero would mean that no caching should be performed.

    Expires=-1 should be ignored when there's a max-age present, and -1 is an invalid date and should be parsed as a value in the past (meaning already expired).

    From RFC2616 HTTP/1.1 Header Field Definitions, 14.21 Expires:

    Note: if a response includes a Cache-Control field with the max-age directive
          (see section 14.9.3), that directive overrides the Expires field
    
    HTTP/1.1 clients and caches MUST treat other invalid date formats, especially
    including the value "0", as in the past (i.e., "already expired").
    

    The Connection: Keep-Alive and Keep-Alive: timeout=60, max=100 configures settings for persistent connections. All connections using HTTP/1.1 are persistent unless otherwise specified, but these headers change the actual timeout values instead of using the browsers default (which varies greatly).