Search code examples
google-analyticsworkbox

Duplicate fetch request when precaching with UTM query


I was going to post this in the Workbox Github repo, but I doubt it's a bug and more likely my own misunderstanding. I have found some slightly similar questions, but none of the answers seem to clearly explain how I can resolve my issue.

In my sw.js file I am precaching the Home URL and the Start URL. The Start URL is the exact same as the Home URL, except it appends ?utm_source=pwa to the URL. This is a technique I've read that others do to track PWA usage in Google Analytics and I like the idea.

However, now when a new user arrives at the website, they load the initial page and then Workbox fetches the Home URL and then fetches the Start URL. This means that if the user arrives at the homepage of the website they will have loaded that page 3 times. I'd like to figure out how to get Workbox to realize that the Home URL and Start URL are essentially the same and to not need that third fetch request.

I understand that ignoreUrlParametersMatching defaults to use [/^utm_/] which I would expect it to do as I described above, but perhaps I'm understanding it incorrectly and it does not apply to prefetched URLs...? Does it automatically apply if I don't explicitly call it from precacheAndRoute()?

To clarify my expectation of ignoreUrlParametersMatching would be that it precaches the Home URL and then when it attempts to cache the Start URL it ignores (removes) the UTM parameter, sees that it already has that URL cached and does not fetch. Then, when the Start URL is requested from cache, it again would ignore the UTM parameter and respond with the URL it has in cache. Is this far off from reality? If so, how should I do this to achieve both my tracking and reduce the "duplicate" fetch?

Here are some excerpts of my sw.js file:

const HOME_URL = 'https://gearside.com/nebula/';
const START_URL = 'https://gearside.com/nebula/?utm_source=pwa';
workbox.precaching.precacheAndRoute([
    //...other precached files
    {url: HOME_URL, revision: revisionNumber},
    {url: START_URL, revision: revisionNumber},
]);

Both URLs are precached:

console precache

Shows both fetch requests:

console fetches

Note: I've noticed this problem with or without revision numbers.


Solution

  • TL;DR

    • Do not include https://gearside.com/nebula/?utm_source=pwa in the precache manifest.
    • Use the workbox-google-analytics module:
    import * as googleAnalytics from 'workbox-google-analytics';
    
    googleAnalytics.initialize();
    

    Long version

    You should precache based on unique resources. Every entry defined in the precache manifest will be downloaded and cached.

    If https://gearside.com/nebula/ and https://gearside.com/nebula/?utm_source=pwa serve the exact same content, only precache one of them (preferably the one without the query string).

    The option ignoreURLParametersMatching serves to specify an array of regexes that will be tested against the query parameters, and if any of them matches, then the route match ignores such query parameter.

    To exemplify,

    precacheAndRoute([
      {url: '/styles/main.css', revision: '777'},
    ], {
      ignoreURLParametersMatching: [/.*/]
    });
    

    Will match any of these requests:

    • /styles/main.css
    • /styles/main.css?minified=0
    • /styles/main.css?minified=0&renew=1

    and serve /styles/main.css, because the regex .* matches any query string.

    The default value of ignoreURLParametersMatching is [/^utm_/]. If in the example above we skip ignoreURLParametersMatching, any of the following requests would be matched (and resolved with the precached /styles/main.css):

    • /styles/main.css
    • /styles/main.css?utm_hello=yes
    • /styles/main.css?utm_yes_what=dunno&utm_really=yeah

    But the following requests will not go through the precache:

    • /styles/main.css?remodelate=expensive&utm_pwa=no
    • /styles/main.css?utm_spa=neither&trees=awesome

    because none of them have exclusively only query parameters starting with utm_.

    More info about the workbox-google-analytics module can be found here: Workbox Google Analytics