performance rust memory union memory-alignment

How is Result<T, E> in Rust so fast?

If you have been online and active in the programming community within the last year, you've surely heard praise for Rust's execution speed and performance and also for the great Result type in Rust.

I should probably mention that I am not a Rust developer. In spite of this, or maybe even because of this, I wonder how it can be that Rust can be so performant, if it uses this Result type, since as far as I'm concerned this type is implemented as what would be called an union in C. It includes within the union an Error and a return value of which only one is valid at a given time. The type also contains a flag that indicates whether the Result contains an error or a value.

If I have counted correctly, and operating under the assumption that the error is stored as a pointer or reference (e.g. occupies 8 bytes in memory on 64-bit systems), then 8 bytes minimum for the union + one byte for the flag, makes 9 bytes of memory.

Now with padding I assume that on most systems this will be realigned to occupy 12 bytes. Compared to that returning an int(32) only allocates 4 bytes. Using Result should thus allocate thrice as much memory as using an int.

Isn't this an extreme waste of memory? I imagine running this in a loop, that would add up quite a bit.

I cannot quite see how anyone can claim that Rust is super performant while Result takes up that much memory?

I am aware that there are a few optimization tricks that can decrease memory usage, like using a NotZeroInt with an option enables the compiler to use zero as the flag, thereby avoiding having an extra byte for the flag. But for most types this isn't applicable, is it?

If anyone has further insight, I would love to hear it. Please be aware that I am not a Rust developer, and am asking this question out of curiosity, as I have observed in libraries that try to port this feature that the memory usage increases drastically.

Surely Rust's Result<T, E> and Option<T> types are better optimized than some ported libraries, but I cannot imagine how this doesn't impact program performance.

Solution

You seem to be familiar with APIs that return integers to report exit status: zero for successes and non-zero for failures. An API expressing the same thing in idiomatic Rust will achieve the same or better performance¹ while being more type safe.

The first step is to use Result to classify the value as a success or failure. Since we don't need to return anything additional in the successful case, we use ():

Result<(), i32>

With this, it does use more space than a simple i32 since at least one bit of information is needed to convey the success. Though to address one of your misconceptions, there is no pointer indirection for the error value; it is stored in-line. So this would be 4 bytes for i32 + 1 byte for discriminant + 3 bytes of padding to achieve 4-byte alignment = 8 bytes.

As you've mentioned, using NonZeroI32 is an option:

Result<(), NonZeroI32>

That could be potentially more clear anyway (receiving Err(0) may otherwise be confusing if you're used to zero expressing success) and gives Result one extra bit of information such that the size is now only 4 bytes total since the compiler can use the all-zero bit-pattern that is unused by NonZeroI32 to express success.

However that is still not really idiomatic Rust since NonZeroI32 is not particularly refined. More often than not you'll have an error type defined as an enum of error cases:

enum MySimpleError {
    FailureA,
    FailureB,
    FailureC,
}

Result<(), MySimpleError>

This is now much more expressive since it is now encoded in the type what values can exist and is even smaller since MyError has way less than 256 variants so it can be encoded as a single byte with bit-patterns to spare such that the whole Result is a single byte.

^{1. It is not likely that values less than a machine word size actually improve performance - only by potentially reducing memory pressure if it can be stored with other small values.}

Another common pattern: sometimes error types can get very large (trying to provide as much useful information as possible) but are not common - or at least should not be. In this case you may wish to introduce indirection via Box for the error type to keep the on-stack size down.

Result<(), Box<MyComplexError>>

This will be a machine word size since Box is guaranteed to not be null, and thus the all-zero bit-pattern can be used for the success case just as is possible with NonZeroI32.

You may find that libraries may do this internally already. For example, the Error from serde-json looks like this:

/// This type represents all possible errors that can occur when serializing or
/// deserializing JSON data.
pub struct Error {
    /// This `Box` allows us to keep the size of `Error` as small as possible. A
    /// larger `Error` type was substantially slower due to all the functions
    /// that pass around `Result<T, Error>`.
    err: Box<ErrorImpl>,
}

Result<(), serde_json::Error> // does not use any extra space

Of course there are plenty of cases where additional space for the discriminant must be used - either due to the success type, error type, or both - but that is simply a necessity to express the data you wish to return. Even if an extra machine word is needed, handling errors in this manner is often more efficient overall than the cost of throwing exceptions should an error occur.