Can I 'ignore' query string variables before pulling matching objects from the cache, but not actually remove them from the URL to the end-user?
For example, all the marketing utm_source
, utm_campaign
, utm_*
values don't change the content of the page, they just vary a lot from campaign to campaign and are used by all of our client-side tracking.
So this also means that the URL can't change on the client side, but it should somehow be 'normalized' in the cache.
Essentially I want all of these...
http://example.com/page/?utm_source=google
http://example.com/page/?utm_source=facebook&utm_content=123
http://example.com/page/?utm_campaign=usa
... to all access HIT the cache for http://example.com/page/
However, this URL would cause a MISS (because the param is not a utm_* param)
http://example.com/page/?utm_source=google&variation=5
Would trigger the cache for
http://example.com/page/?variation=5
Also, keeping in mind that the URL the user sees must remain the same, I can't redirect to something without params or any kind of solution like that.
This did the trick... it's not perfect according to my own question though as it ignores ALL query params, not just utm ones. When I need to actually implement a non-utm value which changes the content I will need to revisit this regex:
sub vcl_recv {
set req.url = regsub(req.url, "\?.*", "");
}