tl;dr. My Service Worker is caching HTML pages and CSS files in different versions. Going offline: since I have to limit the number of files I’m caching, how can I make sure that, for each HTML page in the cache, the versioned CSS files it needs are also in the cache? I need to delete old CSS files at some point, and they have no direct relation with the HTML files.
I’m trying to turn a traditional website into a PWA (sort of), implementing caching strategies with a Service Worker (I’m using Workbox but the question is supposed to be more generalist).
I’m caching pages as I navigate through the website (network-first strategy), in order to make them available offline.
I’m also caching a limited number of CSS and JS assets with a cache-first strategy. The URLs pointing to them are already "cachebusted" using a timestamp embedded in the filename: front.320472502.css
for instance. Because of the cachebusting technique already in place, I only need/want to keep a small number of assets in this cache.
Now here’s the issue I’m having. Let’s suppose I cached page /contact
which referenced /front.123.css
(hence was also cached). As I navigate to other pages, CSS has changed several times in the meantime, and my CSS cache now might contain only /front.455.css
and /front.456.css
. If I’m going offline now, trying to load /contact
will successfully retrieve the contents of the page, but the CSS will fail to load because it’s not in the cache anymore, and it will render an unstyled content page.
Either I keep versions of my CSS in cache for a long time, which is not ideal, or I try to purge cached CSS only if it is not required by any of the cached pages. But how would you go about that? Looping through the cached pages, looking for the front.123.css
string?
Another solution might be to give back an offline page rather than an unstyled content page, but I’m not sure if it is doable, since the worker responds with the HTML before knowing what assets it will need.
The "best" solution here is to use precaching (either via Workbox, or via some other way of generating a build-time manifest), and making sure that all of your HTML and subresources are cached and expired atomically. You don't have to worry about version mismatches or cache misses if you can precache everything.
That being said, precaching everything isn't always a viable option, if your site relies on a lot of dynamic, server-rendered content, or if you have a lot of distinct HTML pages, or if you have a larger variety of subresources, many of which are only required on a subset of pages.
If you want to go with the runtime caching approach, I'd recommend a technique along the lines of what's described in "Smarter runtime caching of hashed assets". That uses a custom Workbox plugin to handle cache expiration and finding a "best-effort" cache match for a given subresource when the network is unavailable. The main difficulty in generalizing that code is that you need to use a consistent naming scheme for your hashes, and write some utility functions to programmatically translate a hashed URL into the "base" URL.
In the interest of providing some code along with this answer, here's a version of the plugin that I currently use. You'll need to customize it as described above for your hashing scheme, though.
import {WorkboxPlugin} from 'workbox-core';
import {HASH_CHARS} from './path/to/constants';
function getOriginalFilename(hashedFilename: string): string {
return hashedFilename.substring(HASH_CHARS + 1);
}
function parseFilenameFromURL(url: string): string {
const urlObject = new URL(url);
return urlObject.pathname.split('/').pop();
}
function filterPredicate(
hashedURL: string,
potentialMatchURL: string,
): boolean {
const hashedFilename = parseFilenameFromURL(hashedURL);
const hashedFilenameOfPotentialMatch =
parseFilenameFromURL(potentialMatchURL);
return (
getOriginalFilename(hashedFilename) ===
getOriginalFilename(hashedFilenameOfPotentialMatch)
);
}
export const revisionedAssetsPlugin: WorkboxPlugin = {
cachedResponseWillBeUsed: async ({cacheName, cachedResponse, state}) => {
state.cacheName = cacheName;
return cachedResponse;
},
cacheDidUpdate: async ({cacheName, request}) => {
const cache = await caches.open(cacheName);
const keys = await cache.keys();
for (const key of keys) {
if (filterPredicate(request.url, key.url) && request.url !== key.url) {
await cache.delete(key);
}
}
},
handlerDidError: async ({request, state}) => {
if (state.cacheName) {
const cache = await caches.open(state.cacheName);
const keys = await cache.keys();
for (const key of keys) {
if (filterPredicate(request.url, key.url)) {
return cache.match(key);
}
}
}
},
};