Search code examples
performancegorustllvm-codegen

Why is swapping elements of a []float64 in Go faster than swapping elements of a Vec<f64> in Rust?


I have two (equivalent?) programs, one in Go the other in Rust. The average execution time is:

  • Go ~169ms
  • Rust ~201ms

Go

package main

import (
    "fmt"
    "time"
)

func main() {
    work := []float64{0.00, 1.00}
    start := time.Now()

    for i := 0; i < 100000000; i++ {
        work[0], work[1] = work[1], work[0]
    }

    elapsed := time.Since(start)
    fmt.Println("Execution time: ", elapsed)
}

Rust

I compiled with --release

use std::time::Instant;

fn main() {
    let mut work: Vec<f64> = Vec::new();
    work.push(0.00);
    work.push(1.00);

    let now = Instant::now();

    for _x in 1..100000000 {
        work.swap(0, 1); 
    }

    let elapsed = now.elapsed();
    println!("Execution time: {:?}", elapsed);
}

Is Rust less performant than Go in this instance? Could the Rust program be written in an idiomatic way, to execute faster?


Solution

  • Could the Rust program be written in an idiomatic way, to execute faster?

    Yes. To create a vector with a few elements, use the vec![] macro:

    let mut work: Vec<f64> = vec![0.0, 1.0];    
    
    for _x in 1..100000000 {
        work.swap(0, 1); 
    }
    

    So is this code faster? Yes. Have a look at what assembly is generated:

    example::main:
      mov eax, 99999999
    .LBB0_1:
      add eax, -11
      jne .LBB0_1
      ret
    

    On my PC, this runs about 30 times faster than your original code.

    Why does the assembly still contain this loop that is doing nothing? Why isn't the compiler able to see that two pushes are the same as vec![0.0, 1.0]? Both very good questions and both probably point to a flaw in LLVM or the Rust compiler.

    However, sadly, there isn't much useful information to gain from your micro benchmark. Benchmarking is hard, like really hard. There are so many pitfalls that even professionals fall for. In your case, the benchmark is flawed in several ways. For a start, you never observe the contents of the vector later (it is never used). That's why a good compiler can remove all code that even touches the vector (as the Rust compiler did above). So that's not good.

    Apart from that, this does not resemble any real performance critical code. Even if the vector would be observed later, swapping an odd number of times equals a single swap. So unless you wanted to see if the optimizer could understand this swapping rule, sadly your benchmark isn't really useful.