Search code examples
redisnestjslocking

Sometimes Redlock lock Key gets acquired by multiple instances at same time


in Nestjs, I have this cron job that I want to execute it once if multiple instances of the app exists by using Redlock:

    @Cron('*/1 * * * * *')
    async test(): Promise<void> {

        try {
            const lockKey = 'test-cron-job-lock';
            const lock = await this.redisService.acquireLock(lockKey, 1000); // 1 second
            this.logger.log(`Cron job at ${this.getTimeWithoutMilliseconds()}`);

            await this.redisService.releaseLock(lock);
        } catch (e) {
            this.logger.error('Error in test cron job');
        }
    }

it works, but sometimes the lockKey gets acquired by multiple instances at same time.

App A:

[Nest] 8872  - 10/04/2024, 4:06:48 PM     LOG Cron job at 2024-10-04T13:06:48Z
[Nest] 8872  - 10/04/2024, 4:06:49 PM   ERROR Error in test cron job
[Nest] 8872  - 10/04/2024, 4:06:50 PM   ERROR Error in test cron job
[Nest] 8872  - 10/04/2024, 4:06:51 PM     LOG Cron job at 2024-10-04T13:06:51Z
[Nest] 8872  - 10/04/2024, 4:06:52 PM     LOG Cron job at 2024-10-04T13:06:52Z
[Nest] 8872  - 10/04/2024, 4:06:53 PM     LOG Cron job at 2024-10-04T13:06:53Z

App B:

[Nest] 8886  - 10/04/2024, 4:06:48 PM   ERROR Error in test cron job
[Nest] 8886  - 10/04/2024, 4:06:49 PM     LOG Cron job at 2024-10-04T13:06:49Z
[Nest] 8886  - 10/04/2024, 4:06:50 PM     LOG Cron job at 2024-10-04T13:06:50Z
[Nest] 8886  - 10/04/2024, 4:06:51 PM   ERROR Error in test cron job
[Nest] 8886  - 10/04/2024, 4:06:52 PM   ERROR Error in test cron job
[Nest] 8886  - 10/04/2024, 4:06:53 PM     LOG Cron job at 2024-10-04T13:06:53Z

As you can see both instances run the job at 2024-10-04T13:06:53Z. why?


Solution

  • Generally race condition with Redlock is rare, but since you set the cron exactly at the same time and the lock TTL is pretty short, a race once in a while is possible.

    There are few ways to solve it, with pros and cons -

    1. longer acquire TTL, not perfect but reduce the chances of a race.
    2. Jitter — add a random delay. Make sure that the delay is not longer than the TTL, otherwise you'll get double action.
    3. Use SET with NX and EX with TTL as parameters after acquiring, something like SET "running" NX EX 10 — if the key already set, you'll get null, otherwise you'll get OK, this is atomic operation so it's not possible to have a race, but in this case I would leave the lock and use the set alone. The EX make sure that the key release after a while.

    Generally I would say that the Redlock fits for cases when you can't predict behavior, and you want to make sure that always there's just one client having it. But if they all start at the same time, taking about the same time to perform, and you need something fully atomic, I'd use SET key NX EX sec with a slight jitter and you are pretty bullet prof.