Why does iteration over an inclusive range generate longer assembly in Rust than in C++?

These two loops are supposed to be equivalent in C++ and Rust:

#include <cstdint>

std::uint64_t sum1(std::uint64_t n) {
    std::uint64_t sum = 0;
    for (std::uint64_t j = 0; j <= n; ++j) {
        sum += 1;
    }
    return sum;
}

pub fn sum1(num: u64) -> u64 {
    let mut sum: u64 = 0;
    for j in 0u64..=num {
        sum += 1;
    }
    return sum;
}

However the C++ version generates a very terse assembly:

sum1(unsigned long):                               # @sum1(unsigned long)
        xor     eax, eax
.LBB0_1:                                # =>This Inner Loop Header: Depth=1
        add     rax, 1
        cmp     rax, rdi
        jbe     .LBB0_1
        ret

while Rust's version is very long with two checks in the loop instead of one:

example::sum1:
        xor     eax, eax
        xor     ecx, ecx
.LBB0_1:
        mov     rdx, rcx
        cmp     rcx, rdi
        adc     rcx, 0
        add     rax, 1
        cmp     rdx, rdi
        jae     .LBB0_3
        cmp     rcx, rdi
        jbe     .LBB0_1
.LBB0_3:
        ret

Godbolt: https://godbolt.org/z/xYW94qxjK

What is Rust intrinsically trying to prevent that C++ is carefree about?

Solution

Overflow in the iterator state.

The C++ version will loop forever when given a large enough input:

#include <cstdint>

std::uint64_t sum1(std::uint64_t n) {  
    std::uint64_t sum = 0;
    for (std::uint64_t j = 0; j <= n; ++j) {
        __asm__ ("");
        sum += 1;
    }
    return sum;
}

#include <iostream>

int main() {
    std::cout << sum1(UINT64_C(0xffffffff'ffffffff)) << std::endl;

    return 0;
}

This is because when the loop counter j finally reaches 0xffffffff'ffffffff, incrementing it wraps around to 0, which means the loop invariant j <= n is always fulfilled and the loop never exits.

Strictly speaking, invoking the original version of sum1 with 0xffffffff'ffffffff infamously triggers undefined behaviour, though not because of overflow alone, but since infinite loops without externally-visible side effects are UB ([intro.progress]/1). This is why in my version I added an empty __asm__ statement to the function to act as a dummy ‘side effect’ preventing the compiler from taking ‘advantage’ of that in optimisation passes.

The Rust version, on the other hand, is not only perfectly well-defined, but iterates exactly as many times as the cardinality of the range:

use std::num::Wrapping;

fn sum1(num: u64) -> u64 {
    let mut sum = Wrapping(0);
    for _ in 0..=num {
        sum += Wrapping(1);
    }
    return sum.0;
}

fn main() {
    println!("{}", sum1(0xffffffff_ffffffff));
}

The above program (slightly modified to avoid getting bogged down in debug versus release mode differences with respect to the summation) will terminate after exactly 18 446 744 073 709 551 616 iterations and print 0; the Rust version carefully maintains iterator state so that overflow does not happen in the iterator.