Situations in which service worker may display old version of app even after new version is loaded?

Sorry, I don't have a reproducible test case of this issue because I've never even seen it happen myself. I only know it happens because of client-side logging in my app and complaints from users.

The problem is:

I deploy a new version of my app
User visits my site, gets the new version, and runs it successfully
User visits my site again and gets an old version of my app

I'm using a service worker, which I was hoping could provide some guarantees about that scenario not happening. It's particularly troubling when the new version includes an IndexedDB schema migration, because then old version of my app won't even work anymore.

More details:

I'm using Workbox 4.3.1. My service worker is basically:

importScripts("https://storage.googleapis.com/workbox-cdn/releases/4.3.1/workbox-sw.js");
workbox.precaching.precacheAndRoute([]);
workbox.routing.registerNavigationRoute("/index.html", {
    blacklist: [
        new RegExp("^/static"),
        new RegExp("^/sw.js"),
    ],
});

workbox.precaching.precacheAndRoute([]); gets filled in by workboxBuild.injectManifest. I can manually confirm that the right files get filled in. And generally the service worker works. I can see it in the browser dev tools. I can disconnect from the Internet and still use my app. Everything seems fine. Like I said above, I've never seen this problem happen, and I don't have a reproducible test case.

But some of my users experienced the problem described above. I tried to use client-side error logging to investigate. I added some code to my app to store its version number in localStorage, and on initial load it compares that version number against the current running version number. If the version in localStorage is more recent than the current running version (i.e., it successfully ran a newer version in the past but now is going back to an older version), it logs the version numbers along with some additional information:

let registrations = [];
if (window.navigator.serviceWorker) {
    registrations = await window.navigator.serviceWorker.getRegistrations();
}
log({
    hasNavigatorServiceWorker:
        window.navigator.serviceWorker !== undefined,
    registrationsLength: registrations.length,
    registrations: registrations.map(r => {
        return {
            scope: r.scope,
            active: r.active
                ? {
                      scriptURL: r.active.scriptURL,
                      state: r.active.state,
                  }
                : null,
            installing: r.installing
                ? {
                      scriptURL: r.installing.scriptURL,
                      state: r.installing.state,
                  }
                : null,
            waiting: r.waiting
                ? {
                      scriptURL: r.waiting.scriptURL,
                      state: r.waiting.state,
                  }
                : null,
        };
    }),
})

Looking in my logs, I see that this problem occurs for only like 1% of my users. Firefox is enriched among these users (4% of overall traffic, but 18% of log entries for this problem), but it happens for all browsers and operating systems.

And I see that almost all records have these values:

{
    hasNavigatorServiceWorker: true,
    registrationsLength: 1,
    registrations: [{
        "scope": "https://example.com/",
        "active": {
            "scriptURL": "https://example.com/sw.js",
            "state": "activated"
        },
        "installing": null,
        "waiting": null
    }]
}

As far as I know, those are all correct values.

I should also note that my JavaScript files have a hash in the URL, so it cannot be that my server is somehow returning an old version of my JavaScript when the user requests a new version.

So, what could be going on? How can this observed behavior be explained? What else could I be logging to debug this further?

The only scenarios I can come up with seem highly implausible to me. Like...

User loads v1, service worker fails for some reason
User loads v2, service worker fails for some reason
User somehow gets v1 from their browser cache, since all prior service workers failed, but now the service worker works correctly and stores this as the current version

But I have no evidence of the service worker ever failing. Nothing in my error logs. No user complaining that offline support is broken.

If it helps, the actual website where this happens is https://play.basketball-gm.com/, the service worker is at https://play.basketball-gm.com/sw.js, and all the code is available on GitHub.

And this problem has been going on ever since I started using a service worker, about a year ago. I am just finally getting around to writing up a Stack Overflow question, because I've given up hope that I'll be able to figure it out on my own or even create a reproducible test case.

Solution

A year later and I've finally figured it out.

I was precaching an asset that was blocked by some ad blockers. This caused service worker installation to fail, keeping the old service worker around longer than intended. Somehow this failure was not caught by my client side error logging.