rust program runs incredibly slow

I have written the following program out of curiousity. It ticks X amount of times every second, and it should be very close to the aimed tick count, but somehow it always lacks behind a lot.

For example for a 2 second benchmark the output is (it should be 2*64 -> 128). Building in release mode doesn't help the issue:

Starting benchmark 2s with length
Benchmark started.
Ticked 83 in 2 seconds

It is the same for other lenghts, for example 5 second yields 196 which should be 320. It should be actually more cause I let it run a little bit more.

Is it because of Arc Mutex being slow, or is there anything more to this? Where is the bottleneck?

use std::{ time::{ SystemTime, UNIX_EPOCH }, sync::Arc };

use tokio::{ time::{ sleep, Duration }, sync::Mutex };

const TICK_COUNT: u64 = 64;
const BENCHMARK_LENGTH: u64 = 2;

async fn main() {
    let tick_count = 0;
    let interval = 1000 / TICK_COUNT;

    let shared = Arc::new(Mutex::new(tick_count));

    let start_time = SystemTime::now();

    println!("Starting benchmark {}s with length", BENCHMARK_LENGTH);

    let displayed = shared.clone();
    let incrementer = shared.clone();

    tokio::spawn(async move {
        loop {
                start_time.duration_since(UNIX_EPOCH).unwrap() +
                    Duration::from_secs(BENCHMARK_LENGTH) <
                println!("Ticked {} in {} seconds", displayed.lock().await, BENCHMARK_LENGTH);



    tokio::spawn(async move {
        loop {

            *incrementer.lock().await += 1;

    let _ = tokio::spawn(async move {
        println!("Benchmark started.");
        loop {


  • I have tried your program on my computer. Duplicating the *incrementer.lock().await += 1; statements does not change the behaviour (the count is simply multiplied consequently), thus the slowdown is not due to the Mutex. Moreover, I replaced the Mutex by an atomic integer, and the behaviour is still the same.

    cargo flamegraph reports a lot of time spent in a system-call below ThreadParker::futex_wait() (of parking_lot), which confirms the intuition I expressed in the comment: most of the time is spent in the OS suspending/resuming the underlying thread of the tokio runtime.

    About the numerical values, I obtain 122 ticks in 2s, not far from the 128 expected. With 128 TICK_COUNT, I obtain 240 in 2s instead of 256 and with 256 TICK_COUNT, I obtain 466 in 2s instead of 512. This is not ideal, but not very far from the expected values. Maybe your OS is not as reactive as mine? (I use a quite usual Linux)