Search code examples
regexquery-stringvarnishvarnish-vcl

Stripping out select querystring attribute/value pairs so varnish will not vary cache by them


My goal is to "whitelist" certain querystring attributes and their values so varnish will not vary cache between the urls.

Example:

Url 1: http://foo.com/someproduct.html?utm_code=google&type=hello  
Url 2: http://foo.com/someproduct.html?utm_code=yahoo&type=hello  
Url 3: http://foo.com/someproduct.html?utm_code=yahoo&type=goodbye

In the above example I want to whitelist "utm_code" but not "type" So after the first url is hit I want varnish to serve that cached content to the second url.

However, in the case of the third url, the attribute "type" value is different so that should be a varnish cache miss.

I have tried the 2 methods below (found on a drupal help article I can't locate right now) that did not seem to work. Might be because I have the regex wrong.

# 1. strip out certain querystring values so varnish does not vary cache.
set req.url = regsuball(req.url, "([\?|&])utm_(campaign|content|medium|source|term)=[^&\s]*&?", "\1");
# get rid of trailing & or ?
set req.url = regsuball(req.url, "[\?|&]+$", "");

# 2. strip out certain querystring values so varnish does not vary cache.
set req.url = regsuball(req.url, "([\?|&])utm_campaign=[^&\s]*&?", "\1");
set req.url = regsuball(req.url, "([\?|&])foo_bar=[^&\s]*&?", "\1");
set req.url = regsuball(req.url, "([\?|&])bar_baz=[^&\s]*&?", "\1");
# get rid of trailing & or ?
set req.url = regsuball(req.url, "[\?|&]+$", "");

Solution

  • I figured this out and wanted to share. I found this code that makes a subroutine that does what I need.

    sub vcl_recv {
    
        # strip out certain querystring params that varnish should not vary cache by
        call normalize_req_url;
    
        # snip a bunch of other code
    }
    
    sub normalize_req_url {
    
        # Strip out Google Analytics campaign variables. They are only needed
        # by the javascript running on the page
        # utm_source, utm_medium, utm_campaign, gclid, ...
        if(req.url ~ "(\?|&)(gclid|cx|ie|cof|siteurl|zanpid|origin|utm_[a-z]+|mr:[A-z]+)=") {
            set req.url = regsuball(req.url, "(gclid|cx|ie|cof|siteurl|zanpid|origin|utm_[a-z]+|mr:[A-z]+)=[%.-_A-z0-9]+&?", "");
        }
        set req.url = regsub(req.url, "(\?&?)$", "");
    }