Varnish - purge.soft does not change TTL or anything

I'm trying to do a soft purge only for certain req.url values, all other invalidations are managed with a ban. While ban is working, purge.soft(0s,30s) does not modify anythig in the cache, TTL remains the standard (7200s) and the cache remains active. What am I doing wrong?

Full VCL code:

https://pastebin.com/QLmBh0hw

GET request log:

    *   << Request  >> 524366
-   Begin          req 524315 rxreq
-   Timestamp      Start: 1664176259.698910 0.000000 0.000000
-   Timestamp      Req: 1664176259.698910 0.000000 0.000000
-   ReqStart       172.16.1.194 23952 a0
-   ReqMethod      GET
-   ReqURL         /it-it/prodotti/catene/catene-di-luci/
-   ReqProtocol    HTTP/1.1
-   ReqHeader      X-Forwarded-For: 84.247.245.84, 130.176.221.197
-   ReqHeader      X-Forwarded-Proto: https
-   ReqHeader      X-Forwarded-Port: 443
-   ReqHeader      Host: test-prod.luminalpark.com
-   ReqHeader      X-Amzn-Trace-Id: Root=1-63315083-50faab2c2fbacda82eba1f9c
-   ReqHeader      User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36
-   ReqHeader      X-Amz-Cf-Id: ToerPgpur_0Y4SvPsU3hkSSzfK-PywWzlX-nlhnQasaji9IGXsFb5g==
-   ReqHeader      Via: 2.0 0f03c98743d9ffe79330c1f694241fc2.cloudfront.net (CloudFront)
-   ReqHeader      Cookie: _dc_gtm_UA-830149-18=1; _fbp=fb.1.1663062717375.1186628968; _ga=GA1.1.2073680112.1662976397; _ga_499EG6CXZD=GS1.1.1664175973.166.1.1664176257.0.0.0; _gcl_au=1.1.1770949614.1662976397; _gid=GA1.2.1336671677.1662976397; _hjSessionUser_21623=eyJpZCI
-   ReqHeader      Accept-Language: it-IT,it;q=0.9,en-US;q=0.8,en;q=0.7,de;q=0.6
-   ReqHeader      Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
-   ReqHeader      Referer: https://test-prod.luminalpark.com/it-it/
-   ReqHeader      Accept-Encoding: gzip, deflate, br
-   ReqHeader      cache-control: max-age=0
-   ReqHeader      sec-ch-ua: "Google Chrome";v="105", "Not)A;Brand";v="8", "Chromium";v="105"
-   ReqHeader      sec-ch-ua-mobile: ?0
-   ReqHeader      sec-ch-ua-platform: "Windows"
-   ReqHeader      dnt: 1
-   ReqHeader      upgrade-insecure-requests: 1
-   ReqHeader      sec-fetch-site: same-origin
-   ReqHeader      sec-fetch-mode: navigate
-   ReqHeader      sec-fetch-user: ?1
-   ReqHeader      sec-fetch-dest: document
-   ReqHeader      CloudFront-Viewer-HTTP-Version: 2.0
-   ReqHeader      CloudFront-Forwarded-Proto: https
-   ReqHeader      CloudFront-Viewer-Address: 84.247.245.84:61581
-   ReqHeader      CloudFront-Viewer-TLS: TLSv1.3:TLS_AES_128_GCM_SHA256:connectionReused
-   ReqHeader      X-Cloudfront-Origin: VC5ZNQ588QNE3S
-   ReqUnset       X-Forwarded-For: 84.247.245.84, 130.176.221.197
-   ReqHeader      X-Forwarded-For: 84.247.245.84, 130.176.221.197, 172.16.1.194
-   VCL_call       RECV
-   ReqURL         /it-it/prodotti/catene/catene-di-luci/
-   ReqURL         /it-it/prodotti/catene/catene-di-luci/
-   ReqUnset       Accept-Encoding: gzip, deflate, br
-   ReqHeader      Accept-Encoding: gzip
-   VCL_return     hash
-   VCL_call       HASH
-   VCL_return     lookup
-   Hit            884747 7171.150022 60.000000 0.000000
-   VCL_call       HIT
-   VCL_return     deliver
-   RespProtocol   HTTP/1.1
-   RespStatus     200
-   RespReason     OK
-   RespHeader     Server: nginx/1.18.0 (Ubuntu)
-   RespHeader     Content-Type: text/html; charset=UTF-8
-   RespHeader     Cache-Control: must-revalidate, public, s-maxage=7200
-   RespHeader     Date: Mon, 26 Sep 2022 07:10:30 GMT
-   RespHeader     Strict-Transport-Security: max-age=31536000; includeSubDomains
-   RespHeader     X-Frame-Options: deny
-   RespHeader     X-Content-Type-Options: nosniff
-   RespHeader     Referrer-Policy: strict-origin-when-cross-origin
-   RespHeader     sw-invalidation-states:
-   RespHeader     Set-Cookie: sw-states=deleted; expires=Sun, 26-Sep-2021 07:10:29 GMT; Max-Age=0; path=/; secure; httponly; samesite=lax
-   RespHeader     Set-Cookie: sw-cache-hash=deleted; expires=Sun, 26-Sep-2021 07:10:29 GMT; Max-Age=0; path=/; httponly
-   RespHeader     Content-Encoding: gzip
-   RespHeader     Vary: Accept-Encoding
-   RespHeader     x-url: /it-it/prodotti/catene/catene-di-luci/
-   RespHeader     X-Cacheable: YES
-   RespHeader     X-Varnish: 524366 884747
-   RespHeader     Age: 28
-   RespHeader     Via: 1.1 varnish (Varnish/6.0)
-   VCL_call       DELIVER
-   RespUnset      Cache-Control: must-revalidate, public, s-maxage=7200
-   RespHeader     Cache-Control: max-age=0, private
-   RespUnset      Set-Cookie: sw-states=deleted; expires=Sun, 26-Sep-2021 07:10:29 GMT; Max-Age=0; path=/; secure; httponly; samesite=lax
-   RespUnset      Set-Cookie: sw-cache-hash=deleted; expires=Sun, 26-Sep-2021 07:10:29 GMT; Max-Age=0; path=/; httponly
-   RespHeader     X-Cache: HIT
-   RespUnset      sw-invalidation-states:
-   RespHeader     X-Cache-Hits: 2
-   VCL_return     deliver
-   Timestamp      Process: 1664176259.698985 0.000076 0.000076
-   RespHeader     Accept-Ranges: bytes
-   RespHeader     Content-Length: 110181
-   RespHeader     Connection: keep-alive
-   Timestamp      Resp: 1664176259.699106 0.000196 0.000120
-   ReqAcct        3158 0 3158 612 110181 110793
-   End

and here's varnishlog during PURGE

curl -XPURGE http://<varnish-host>/it-it/prodotti/catene/catene-di-luci/



    *   << Request  >> 622680
-   Begin          req 622679 rxreq
-   Timestamp      Start: 1664176453.712271 0.000000 0.000000
-   Timestamp      Req: 1664176453.712271 0.000000 0.000000
-   ReqStart       172.16.2.136 47728 a0
-   ReqMethod      PURGE
-   ReqURL         /it-it/prodotti/catene/catene-di-luci/
-   ReqProtocol    HTTP/1.1
-   ReqHeader      Host: 172.16.3.37
-   ReqHeader      User-Agent: curl/7.68.0
-   ReqHeader      Accept: */*
-   ReqHeader      X-Forwarded-For: 172.16.2.136
-   VCL_call       RECV
-   ReqURL         /it-it/prodotti/catene/catene-di-luci/
-   ReqURL         /it-it/prodotti/catene/catene-di-luci/
-   VCL_acl        MATCH purgers "ecommerce-node1-prod"
-   VCL_return     hash
-   VCL_call       HASH
-   VCL_return     lookup
-   VCL_call       MISS
-   ReqHeader      purged: 0
-   VCL_return     synth
-   Timestamp      Process: 1664176453.712326 0.000055 0.000055
-   RespProtocol   HTTP/1.1
-   RespStatus     404
-   RespReason     Not Found
-   RespReason     Not Found
-   RespHeader     Date: Mon, 26 Sep 2022 07:14:13 GMT
-   RespHeader     Server: Varnish
-   RespHeader     X-Varnish: 622680
-   VCL_call       SYNTH
-   RespHeader     purged: 0
-   VCL_return     deliver
-   RespHeader     Content-Length: 0
-   Storage        malloc Transient
-   RespHeader     Connection: keep-alive
-   Timestamp      Resp: 1664176453.712364 0.000093 0.000038
-   ReqAcct        114 0 114 153 0 153
-   End

Solution

I've tested it myself and I'm not experiencing any issues.

The VCL code

Here's the standard vmod_purge example VCL code I took from https://varnish-cache.org/docs/6.0/reference/vmod_generated.html#example and adapted:

I removed the ACL check
I converted purge.hard() to purge.soft(0s,30s)

sub vcl_recv {
    if (req.method == "PURGE") {
        return (hash);
    }
}

sub my_purge {
    set req.http.purged = purge.soft(0s,30s);
    if (req.http.purged == "0") {
        return (synth(404));
    }
    else {
        return (synth(200));
    }
}

sub vcl_hit {
    if (req.method == "PURGE") {
        call my_purge;
    }
}

sub vcl_miss {
    if (req.method == "PURGE") {
        call my_purge;
    }
}

sub vcl_synth {
    if (req.method == "PURGE") {
        if (req.http.purged) {
            set resp.http.purged = req.http.purged;
        }
        return (deliver);
    }
}

What soft purging does

The way you've configured soft purging will ensure the object is considered stale, however because you set the grace time to 30s, async revalidation may still happen until 30 seconds past the lifetime of the object.

Example HTTP calls

Here are some example calls that illustrate what happens when soft purging takes place.

Step 1: call the endpoint and receive a fresh object

➜  ~ curl -I localhost
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
Content-Length: 595
ETag: W/"253-BCiA67uc9pD4vCc07ppQaevMbj0"
Date: Thu, 22 Sep 2022 07:51:22 GMT
X-Varnish: 65546 12
Age: 18
Via: 1.1 varnish (Varnish/6.6)
Accept-Ranges: bytes
Connection: keep-alive

As you can see we're calling the http://localhost endpoint and get a result with an Age: 18 header. This means the object has been cached for 18 seconds.

Step 2: purge

➜  ~ curl -XPURGE -I localhost
HTTP/1.1 200 OK
Date: Thu, 22 Sep 2022 07:51:42 GMT
Server: Varnish
X-Varnish: 32784
purged: 1
Content-Length: 0
Accept-Ranges: bytes
Connection: keep-alive

In step 2 we're performing the soft purge. The purged: 1 header implies that 1 object was purged.

Step 3: grace mode kicks in

➜  ~ curl -I localhost
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
Content-Length: 595
ETag: W/"253-BCiA67uc9pD4vCc07ppQaevMbj0"
Date: Thu, 22 Sep 2022 07:51:22 GMT
X-Varnish: 65552 12
Age: 26
Via: 1.1 varnish (Varnish/6.6)
Accept-Ranges: bytes
Connection: keep-alive

After the purge the we're still seeing cached content being served because the Age: 26 header that implies the object has been cached for 26 seconds.

But the output from varnishlog -g request show that the stale content is served while an asynchronous fetch happens for the new content. This is a direct result of calling purge.soft(0s, 30s):

*   << Request  >> 65552
-   Begin          req 65551 rxreq
-   Timestamp      Start: 1663833108.524685 0.000000 0.000000
-   Timestamp      Req: 1663833108.524685 0.000000 0.000000
-   VCL_use        boot
-   ReqStart       172.24.0.1 58300 http
-   ReqMethod      HEAD
-   ReqURL         /
-   ReqProtocol    HTTP/1.1
-   ReqHeader      Host: localhost
-   ReqHeader      User-Agent: curl/7.79.1
-   ReqHeader      Accept: */*
-   ReqHeader      X-Forwarded-For: 172.24.0.1
-   VCL_call       RECV
-   VCL_return     hash
-   VCL_call       HASH
-   VCL_return     lookup
-   Hit            12 -1.375188 30.000000 0.000000
-   VCL_call       HIT
-   VCL_return     deliver
-   Link           bereq 65553 bgfetch
-   Timestamp      Fetch: 1663833108.524864 0.000179 0.000179
-   RespProtocol   HTTP/1.1
-   RespStatus     200
-   RespReason     OK
-   RespHeader     Content-Type: application/json; charset=utf-8
-   RespHeader     Content-Length: 595
-   RespHeader     ETag: W/"253-BCiA67uc9pD4vCc07ppQaevMbj0"
-   RespHeader     Date: Thu, 22 Sep 2022 07:51:22 GMT
-   RespHeader     X-Varnish: 65552 12
-   RespHeader     Age: 26
-   RespHeader     Via: 1.1 varnish (Varnish/6.6)
-   VCL_call       DELIVER
-   VCL_return     deliver
-   Timestamp      Process: 1663833108.524876 0.000191 0.000011
-   Filters
-   RespHeader     Accept-Ranges: bytes
-   RespHeader     Connection: keep-alive
-   Timestamp      Resp: 1663833108.524937 0.000252 0.000061
-   ReqAcct        74 0 74 275 0 275
**  << BeReq    >> 65553
--  Begin          bereq 65552 bgfetch
--  VCL_use        boot
--  Timestamp      Start: 1663833108.524815 0.000000 0.000000
--  BereqMethod    HEAD
--  BereqURL       /
--  BereqProtocol  HTTP/1.1
--  BereqHeader    Host: localhost
--  BereqHeader    User-Agent: curl/7.79.1
--  BereqHeader    Accept: */*
--  BereqHeader    X-Forwarded-For: 172.24.0.1
--  BereqMethod    GET
--  BereqHeader    Accept-Encoding: gzip
--  BereqHeader    If-None-Match: W/"253-BCiA67uc9pD4vCc07ppQaevMbj0"
--  BereqHeader    X-Varnish: 65553
--  VCL_call       BACKEND_FETCH
--  VCL_return     fetch
--  Timestamp      Fetch: 1663833108.524843 0.000027 0.000027
--  Timestamp      Connected: 1663833108.525006 0.000191 0.000163
--  BackendOpen    31 default 172.24.2.11 80 172.24.2.14 52026 connect
--  Timestamp      Bereq: 1663833108.525047 0.000231 0.000040
--  Timestamp      Beresp: 1663833108.530071 0.005256 0.005024
--  BerespProtocol HTTP/1.1
--  BerespStatus   200
--  BerespReason   OK
--  BerespHeader   Content-Type: application/json; charset=utf-8
--  BerespHeader   Content-Length: 598
--  BerespHeader   ETag: W/"256-1rfBjX0LanGZOnmzdwpQfzIjU30"
--  BerespHeader   Date: Thu, 22 Sep 2022 07:51:48 GMT
--  BerespHeader   Connection: keep-alive
--  BerespHeader   Keep-Alive: timeout=5
--  TTL            RFC 120 10 0 1663833109 1663833109 1663833108 0 0 cacheable
--  VCL_call       BACKEND_RESPONSE
--  VCL_return     deliver
--  Timestamp      Process: 1663833108.530118 0.005302 0.000046
--  Filters
--  Storage        malloc s0
--  Fetch_Body     3 length stream
--  BackendClose   31 default recycle
--  Timestamp      BerespBody: 1663833108.530294 0.005479 0.000176
--  Length         598
--  BereqAcct      195 0 195 214 598 812
--  End

The Hit 12 -1.375188 30.000000 0.000000 line from the logs shows that the object has a remaining lifetime of -1.375188 seconds which is still more than -30 which is the grace limit.

The Link bereq 65553 bgfetch log line proves that a background fetch is made to the backend to store the latest version of the content. The transaction id in the Link tag matches the BeReq transaction that is part of the logs.

So while the new object is being fetched in transaction 65553, the response that is returned is still the old one with Age: 26 to prove it.

Step 4: next the next request gets the fresh content

Because grace mode will trigger an async fetch if there's grace time left, the current user doesn't see the impact of that fetch. However, the next user will get fresh content. The following cURL output illustrates this:

➜  ~ curl -I localhost
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
Content-Length: 598
ETag: W/"256-1rfBjX0LanGZOnmzdwpQfzIjU30"
Date: Thu, 22 Sep 2022 07:51:48 GMT
X-Varnish: 32790 65553
Age: 0
Via: 1.1 varnish (Varnish/6.6)
Accept-Ranges: bytes
Connection: keep-alive

Because Age: 0 is set, this is a brand new object. However as time progresses, the age counter will increase.

Conclusion

Soft purging with a non-zero grace value will not immediately remove an object from the cache. Instead it marks the object as expired and gives it grace time.

This ensures that the first visitor after the soft purge doesn't have to wait for the backend fetch to be completed.

It's a tradeoff between serving fresh content immediately and letting users benefit from the performance of Varnish will fetching content asynchronously and serving stale objects while that happens.

UPDATE: examining the your VCL & logs

After having examined your VCL & the VSL logs you provided, I'm assuming the creation of the caching hash differs from your GET call and your PURGE call.

Just for reference, the hash in vcl_hash is created using the URL value and the value of the Host header.

The logs indicate that you perform the GET call with the following Host header:

Host: test-prod.luminalpark.com

When you do the purge, you use the following Host header:

Host: 172.16.3.37

As this values differ, the hash will also differ. That's why the PURGE call results in a 404.

Conclusion: please use the right Host header to invalidate your content.