Search code examples
c#.netredisstackexchange.redis

Why does the KeyAsync loop hang when scanning Redis keys with StackExchange.Redis?


I'm trying to scan all Redis keys that match a specific pattern using the StackExchange.Redis library. However, my code hangs at the await foreach statement while scanning the keys. Here’s what I have:

public class DataMigration(IConnectionMultiplexer connection) : IHostedService
{
    public async Task StartAsync(CancellationToken cancellationToken)
    {
        const string pattern = "{hangfire}:recurring-job:EmailNotificationJob:*";
        List<RedisKey> keys = [];

        var db = connection.GetDatabase();

        foreach (var endpoint in connection.GetEndPoints())
        {
            var server = connection.GetServer(endpoint);

            if (!server.IsConnected || server.IsReplica)
            {
                continue;
            }

            // Using SCAN for pagination
            await foreach (var key in server.KeysAsync(database: db.Database, pattern: pattern).WithCancellation(cancellationToken))
            {
                keys.Add(key); // this is never reached
            }
        }

        Console.WriteLine($"Found {keys.Count} keys to update."); // this is never reached
    }

    public Task StopAsync(CancellationToken cancellationToken) => Task.CompletedTask;
}
var builder = Host.CreateApplicationBuilder(args);

builder.Services.AddSingleton<IConnectionMultiplexer>(serviceProvider =>
{
    var loggerFactory = serviceProvider.GetRequiredService<ILoggerFactory>();

    const string redisHost = ...
    const int redisPort = ...
    const string redisPassword = ...

    var redisConfiguration = new ConfigurationOptions
    {
        EndPoints = { $"{redisHost}:{redisPort}" },
        Password = redisPassword,
        Ssl = true,
        AbortOnConnectFail = false, // Keep trying to connect
        ConnectTimeout = 15000, // 15 seconds timeout
        SyncTimeout = 15000, // 15 seconds timeout for synchronous operations
        LoggerFactory = loggerFactory
    };

    return ConnectionMultiplexer.Connect(redisConfiguration);
});

builder.Services.AddHostedService<DataMigration>();

var app = builder.Build();

app.Run();

Solution

  • The first thing to check here is: is anything happening behind the scenes? The easiest way to do this is to use monitor in a redis-cli console session to watch the ongoing traffic - I would expect to see a flurry of SCAN ... operations being issued (each with different tokens). If the server is really busy, it may not be practical to use monitor for this; another alternative is to use RESP logging at the client.

    Note: the nature of redis is that a filtered SCAN can require many many operations before it finds a match if it is a very large database with very few matches - it can't simply jump straight to the matches; rather it will do lots of round trips with zero matches before it finds the first match (if any!).

    If this is the problem: you may be able to significantly reduce the time taken by increasing the page size, i.e. the amount of work to do per round-trip, noting that it still might not find any matches for that page. So instead of checking 50 keys before yielding, you could check 500 keys, or 5000 keys. There are optional pararameters on the API for this.

    Another possibility is that you have a low-level server and it is using KEYS (instead of SCAN); in that scenario the same problems apply, except the KEYS operation also blocks the server from processing concurrent requests on other connections; not good!

    If you do not see either a flurry of SCAN operations, or a single long-running KEYS operation... maybe let me know what you do see?

    As a side note: crawling the redis keyspace via SCAN or KEYS is not efficient, and you should avoid requiring to do this as part of routine application logic. It is appropriate for admin purposes, for example reviewing or categorising your database to see what flavors of data exist, and how much space each is taking. For application logic, you should usually perform your own indexing using the redis API, such that you don't need to crawl; for example, keeping a hash or set of the keys that apply to some rule.