Search code examples
amazon-web-servicesredisamazon-elasticachevalkey

How to delete keys matching pattern in AWS Elasticache Valkey


So, I have a Node.js Lambda that saves its response in an AWS Elasticache Valkey cache. All the keys follow the same format: getActivities:*. I'd like to clear all the keys matching this pattern.

I tried to clear the cache using Node.js, but I encountered an error: CROSSSLOT Keys in request don't hash to the same slot.

I don't think running this on Node.js is a good idea. How can I clear my cache using AWS.


Solution

  • The author's answer will work, but it is a big no no.

    I will use Valkey over the answer, but the same is true for Redis, and Glide can be used for Redis-OSS version as well (maybe also no OSS versions, but we don't follow the code base changes).

    To the hashtag idea:
    When you hashtag, your keys will be distributed to a specific shard.
    If it is a small amount compared to your cluster node's size, that's fine, and in some cases this is the best practice for handling things.

    However, this not recommended for general usage.
    The idea of clustering is to distribute your keys and network between different shards. Sharding works base on a very solid hash algorithm. If you manually distribute keys into one specific shard, you are creating an unbalance cluster.
    It is ok to use hashtag, but they should be used carefully when no other good options exist.
    While in this case, there are plenty of better and more healthy ways to do so.

    For the KEYS usage:
    The command documentation: keys warning

    Using KEYS will block the node until the command find the whole keys in the node.
    It will hurt the performance, will create unresponsiveness, potentially leading to connection storm which lead to extreme high network which doesn't let the server get back to being responsive.

    While Elasticache has ways to deal with such mistakes and misbehave, at the end of the day, it cannot force users to behave in a certain way, and if you decide to crash it, at some point you'll cross what's the system can defend from.

    There's also client libraries that have fault-tolerant methods to defend your system against connection storm and better deal with crashes or blocked nodes, and it's recommended to use them.
    At that point, valkey-glide proves to be the best for fault tolerance since it was designed base on years of customers' pains and issues, see valkey-glide. But I'm a glide maintainer so don't trust me on that, take it for a ride, you can crash some do some manual failovers to see if I'm being real.
    But also in the case of using the best client (valkey-glide), and Elasticache defends mechanism, machines have limitations, if you cross the limitation of the machine by decision, it will crash.

    In some cases, KEYS by itself may crash your node by crossing the max CPU of the node. It is not production command!

    So what yes:

    Method 1 - using client library:
    Use a client library with cluster connection support.
    Again, recommend valkey-glide.
    The bellow explanation is using valkey-glide, but you can implement similar, mechanism using other clients as well. The code sample will be TS, but Glide, at the time of writing those words supports python, Java, and Node.js, Go is going to public preview in about 2 weeks, and planned for GA in two months. Ruby and CPP are under development, and C# is planned for around Aug 25. But don't catch me on the exact date.

    Glide has a Cluster Scan feature, which gives you the guarantees of SCAN.
    While writing this comment, OOTB Cluster Scan is under development by Valkey, and is not implemented by Redis, as far as I know.
    Other clients library also offers cluster scan feature, but base on research, couldn't find one with a mechanism that guarantees you to get the whole keys (I implemented Cluster Scan in glide, and did a vast research to find if there's already good solutions, but maybe there is).

    Using Cluster Scan lets you iterate over the keys in your cluster, based on pattern as well, you can further improve the scan by limiting it to the data type you are looking for if you have a specific one.

    For each iteration of the Scan, you just pass the results to DEL, until you finish the iteration. You can pass higher COUNT parameter if you want to do so faster, but it is very fast anyway, a matter of one or two digits ms.

    You don't need to hashtag, since by using the pattern itself, and cluster scan, the client will iterate by itself on all the clusters.

    A note specifically for lambda, or any write-only systems. If it's not your case, just skip to the code sample
    If you are using it with labmda, since it is a write-only system, you need to turn off the logger before, since by default Glide set up a logger to WARN and logs to a file.
    You can change it to usual logs to std, and a planned feature is integration with cloudwatch and similar. For write-only systems which have an environment var which you can pass, if you still want a log file, you can set XDG_RUNTIME_DIR env variable to /dev/shm.

    The code sample:

    import { GlideClusterClient, ClusterScanCursor, GlideString, Logger } from "@valkey/valkey-glide";
    
    Logger.init("off"); // For canceling logging on write-only file systems
    
    // Creating the cluster client, you can pass one or more node addresses. Anyway Glide will populate the cluster and discover them.
    const client = await GlideClusterClient.createClient({
      addresses: [
        {
          host: "myclustername.xxxxxx.cfg.usw2.cache.amazonaws.com" // In case of AWS ElastiCache, use the configuring endpoint of the cluster
          , port: 6379 // Replace with the port of the cluster
        }
      ],
      // useTLS: true if using TLS
      // requestTimeout: Number Timeout in milliseconds - recommended to be set to a value that fit the use case
    });
    
    let cursor = new ClusterScanCursor();
    
    let keys: GlideString[] = [];
    
    // The scan will iterate over the whole cluster, meaning - you don't need to use hashtag to one specific shard, provide a pattern to the match parmeter.
    while (!cursor.isFinished()) {
      [cursor, keys] = await client.scan(cursor, {
        match: "getActivities:*", count: 10
      });
      await client.del(keys);
    }
    

    Using valkey-cli or "connect to cache" of Elasticache:
    The best solution i was able to find is to connect to the valkey-cli in "cluster mode cli", with the "connect" of Elasticache it is done automatically, with local valkey-cli add -c to the connecting cli command:
    Short explanation of what happens:
    CLUSTER NODES to retrieve all nodes-id's,
    GET {master-id} will move you to the master you want. Do so master after master so you clean all the cluster.
    The EVAL is a "valkey script" to retrieve results from the node using SCAN and deletes the return values (I wrote this one, you can write your own if you want for any reason).
    EVAL script could be written to loop all over the keys without you iterate, it's just like any script, but doing so you'll end up blocking the server like with keys. So work a bit harder and iterate manually.
    Each iteration, you replace the cursor by the returned one, until it's returning zero again, then you return the process over the next master.

    # Run cluster node to get all ids
    CLUSTER NODES
    3eff73b9e0e46d8c7970d9dc5e9fa79e1173b02d clustername-0003-002.clustername.***.use2.cache.amazonaws.com: 6379@1122 slave a2c285fec7283b56d54f13e11195dcab8ebb16c3 0 1737819161000 0 connected
    1953e97956ea213c3d2782c6a438d961429ac994 clustername-0002-001.clustername.***.use2.cache.amazonaws.com: 6379@1122 master - 0 1737819160000 3 connected 5462 - 10922
    dd003f7bed47b73c4eb1375842160d4d7c39c766 clustername-0001-002.clustername.***.use2.cache.amazonaws.com: 6379@1122 slave 5159451a2251b9b0fc61554a7a2ccae36969d9e8 0 1737819160783 1 connected
    9658871c7c6409d0b411cce702e316410f97eaab clustername-0001-003.clustername.***.use2.cache.amazonaws.com: 6379@1122 slave 5159451a2251b9b0fc61554a7a2ccae36969d9e8 0 1737819161786 1 connected
    76a775f6cd1653d71ca25c4cb84190eb32921455 clustername-0002-003.clustername.***.use2.cache.amazonaws.com: 6379@1122 slave 1953e97956ea213c3d2782c6a438d961429ac994 0 1737819160000 3 connected
    a2c285fec7283b56d54f13e11195dcab8ebb16c3 clustername-0003-001.clustername.***.use2.cache.amazonaws.com: 6379@1122 myself, master - 0 0 0 connected 10923 - 16383
    5159451a2251b9b0fc61554a7a2ccae36969d9e8 clustername-0001-001.clustername.***.use2.cache.amazonaws.com: 6379@1122 master - 0 1737819162789 1 connected 0 - 5461
    02dce0129f2cc72823301071ebe2c3189a80fe8f clustername-0003-003.clustername.***.use2.cache.amazonaws.com: 6379@1122 slave a2c285fec7283b56d54f13e11195dcab8ebb16c3 0 1737819159000 0 connected
    3678acf328006bfbaf95285c86e81d592f78bdfc clustername-0002-002.clustername.***.use2.cache.amazonaws.com: 6379@1122 slave 1953e97956ea213c3d2782c6a438d961429ac994 0 1737819161000 3 connected
    
    # For each of the master, use the master id to switch the connection and do the bellow
    GET 1953e97956ea213c3d2782c6a438d961429ac994 # The id of one of the master returned from cluster nodes command, will switch to the specific master
    EVAL "local cursor = tonumber(ARGV[1]); local pattern = ARGV[2]; local count = tonumber(ARGV[3]); local keys = redis.call('SCAN', cursor, 'MATCH', pattern, 'COUNT', count); cursor = tonumber(keys[1]); for i, key in ipairs(keys[2]) do redis.call('DEL', key) end; return cursor;" 0 0 "pattern" 100 
    # Change the "0 0 "pattern" 100" at the end, the seconed 0 change for each iteration to the cursor return, the pattern to the pattern you want, the 100 to the count you want.
    (integer) 13
    

    Repeat until you get (integer) 0 then move to the next master.