Search code examples
rustbreak

Return value from loop expression with break


Example 1

fn five() -> i32 {
    5   // ; not allowed I understand why
}

fn main() {
    let x = five();
    println!("The value of x is: {x}");
}

Example 2 (from https://doc.rust-lang.org/stable/book/ch03-05-control-flow.html)

fn main() {
    let mut counter = 0;

    let result = loop {
        counter += 1;

        if counter == 10 {
            break counter * 2;
        }
    };
    println!("The result is {result}");
}

I understand why in Example 1 it must be 5 and not 5;, but I am confused with Example 2, and have a few questions.

Question 1:

Why do we have ; here? It will work without ;, so why is it there? Is it some Rust convention or is there some technical reason?

Question 2:

If I do break; counter * 2; it will not return a value. What is the difference between break; counter * 2; and break counter * 2;?
Why does the second one work?

Question 3:

If I do:

break counter * 2
println!("After break");

compile error is: error: expected ;, found println
If I do:

break counter * 2;
println!("After break");

there is no more compile error, but:

15 |             println!("After break");
   |             ^^^^^^^^^^^^^^^^^^^^^^^ unreachable statement

But at least I understand this.
What I do not understand is why the break counter * 2 is working fine but if I add something after it we have compile error.

To be honest, I am confused with this Example 2 my understanding is that if we want to return value from expression last line should be without ";" (like in Example 1), but clearly Example 2 proves otherwise.


Solution

  • Rust is a very expression-oriented language. And expressions, crucially, return values. When you write a function, you're writing an expression. That expression can consist of several statements separated by semicolons.

    This is where Rust diverges from most other C-derived languages. Expressions are the driver in Rust, and semicolons separate statements. So a valid expression is { a ; b ; c ; d }, where d is the eventual result. a, b, and c are mere side effects. In C, by contrast, a function is a sequence of statements terminated by semicolons, and statements contain expressions. So in C, a function body might look like { a ; b ; c ; d ; }, where each statement is executed for side effects, and one of them might happen to be a return statement, but it's still a statement.

    If a sequence of expressions in Rust ends in a semicolon, Rust assumes you meant to insert an extra () as the end, so { a ; b ; c ; d ; } translates to { a ; b ; c ; d ; () }. This is why we don't have to write () at the end of all of our unit-returning functions. It's just the default.

    It's a more functional way of looking at things. A function returns a value, and whatever else happens is a side effect. The "usual" return value at the end of a function is simply that value, as an expression.

    Now, because Rust supports a more imperative style (and because it's often useful and convenient), Rust also supports statements such as break and return, which break out of the usual flow of control early. These are statements. They have side effects which happen to return values, but they are not the "usual" return value of the expression.

    let result = loop {
      counter += 1;
      if counter == 10 {
        break counter * 2;
      }
    };
    

    The inside of the loop, like most things in Rust, is an expression. So it can return a value. That value is ignored, since the loop is just going to run again. In this case, the block is equivalent to

    let result = loop {
      counter += 1;
      if counter == 10 {
        break counter * 2;
        ()
      } else {
        () // 'if' can also insert () in the else block when used as an expression
      }
    };
    

    and we return () explicitly. If you remove the semicolon, you get

    let result = loop {
      counter += 1;
      if counter == 10 {
        break counter * 2
      } else {
        () // 'if' can also insert () in the else block when used as an expression
      }
    };
    

    break, as an expression, "returns" a value as well. That value is of the diverging type, called never or !. Since break is guaranteed to diverge (i.e. to exit the usual flow of control), it returns !, which is the only type in Rust that is compatible with every other type. So the result of this if expression is still (), since ! can convert to (). This is all moot, of course, since the loop will just run again if not broken, but that's what Rust is reasoning about internally.

    In summary, you're not trying to return from the last line of the { ... } block of the loop. You're trying to break out of the loop, which is not a normal return; it breaks the usual rules that the loop would follow, so it needs a special statement, and statements in Rust are separated from one another by a semicolon. The fact that you can end the statement sequence without a semicolon is incidental here, since loop ignores its block's result anyway.