Search code examples
next.jscachingvercelapp-routerstatic-site-generation

Next static page on Vercel first load is not cached


I have some statically generated pages in my Next app (app router).

When I build the app locally, those pages are rendered immediately on navigation, as I would expect.

The same app deployed to Vercel however, the first load of those same pages are lagged, and take some 500ms to load. Only after that first render those pages are cached and render immediately.

Am I missing some cache-control configuration? I thought all this was supposed to work out of the box?

I tried going through the Next and Vercel docs on caching, but they don't seem very helpful in this case. They all say it should work out of the box with no further configuration needed.

For reference, the Cache-Control value on Vercel is public, max-age=0, must-revalidate. On localhost (running the build), it is s-maxage=31536000


Solution

  • It looks like the problem you're describing are long 'cold start' times caused by Vercel's CDN, caused by their pull-model architecture.

    Just for some background:

    Cold starts usually refer to serverless/cloud functions, but CDNs can also cause cold starts because at their foundation, they're simply multiple servers that store static content (pages) located in different parts of the world, so that a server in close proximity to the user is always available.

    These are connected to a central origin server, through a 'pull model' architecture. This means new files on all other servers that aren't the origin, are updated based on the FIRST Request/FIRST time accessed.

    Every time you push a change, Vercel updates the content at the origin server and invalidates the previous cache policy on all servers.

    From here, let's examine what happens from a user's perspective. For the user, there are actually two caches. The browser and the CDN, both hold a cache.

    Lets examine two scenarios:

    1. A user has previously visited the page & max-age key is not expired
    2. Every other condition besides the first

    If a user has visited the page prior or max-age key is not expired

    1. The browser checks it's own cache for the files. Any changes at the CDN level are essentially ignored.
    2. The static pages are served immediately with almost no wait (<100ms)
    3. To alter this behavior, set a lower max-age on your static file to ensure the CDN is hit more frequently. However, this might increase your hosting costs. For testing, defer this behavior by loading the page in incognito mode.

    If the user has not visited the page prior, max-age has expired, or incognito mode

    1. The browser sends a request to the nearest CDN server.
    • If the nearest one is the origin, the content is immediately sent to the browser. This takes around (<100ms).

    • If the nearest server is NOT the origin, and if the user is the FIRST request (after invalidation), the CDN server must fetch the content from the origin server, and download it. This takes (~200ms) depending on the size of the file. Then the server can send the downloaded content to the user

    • If the nearest server is NOT the origin, but the user is a request AFTER the first (after invalidation), the content existing on the nearest server is immediately sent because it is the new content after the change (~100ms).

    1. Then the browser downloads your static files and serves them (~200ms)
    2. In total, this potential three-node leap will take ~300 - ~500ms, and depends on whether the user is the first request or not

    So now back to the original concern, why on first request does your static page seems to take ~500ms to load. Well, it’s simply because this is how the pull-method for CDNs work. This is an architectural decision made by most CDN providers. Any subsequent request after the first, should be speedy either because of a user's browser cache or an updated CDN server near the user. Vercel has probably made the decision to use the pull-method architecture because its vastly less expensive to maintain a CDN service with it.

    If they updated files on all geographic/CDN servers, upon changes in the origin server, (push-based model) it could get expensive very fast, since apps can have hundreds to thousands of files, which would need to be propagated to all CDN servers at a time. If this scales to the thousands of apps hosted on Vercel, you could have potentially millions to billions of requests coming in at a time as people update their apps. Therefore, the pull model for CDNs only updates the nearest geographic CDN servers, when real users absolutely need the files (usually at first request), instead of all at once.

    For most apps, this is a non-issue because apps will have thousands of users on them at a time, so having one user experience a slower loading time, every now and then is not a huge concern, especially if it saves the CDN provider, and by extension the developer, a significant amount of money.

    If the first time request lag is still unacceptable, I would look into reducing the file size of these static files/pages, to improve this time.

    This is a rather oldish article, but it details how Vercel's CDN service is a pull-based CDN service, but using Vercel's Edge Config would transform it to a push-based service, where when changes are pushed to the origin, they are propagated to all CDN servers. This is the expensive model I described above, and as a result would incur an additional cost. This is why it is not the default behavior for Vercel apps, and it's opt-in.