Search code examples
error-handlingrustiterator

Why can you not use ? (question mark) directly on collect() in Rust?


Let's say we have a buffered reader and want to collect the lines from it and propagate any errors using ?:

let lines: Vec<_> = rdr.lines().collect()?;

This does not compile because error[E0277]: the size for values of type `str` cannot be known at compilation time

Searching around the answer is per https://github.com/rust-lang/rust/issues/49391

let lines: Vec<_> = rdr.lines().collect::<Result<Vec<_>,_>>()?;

My question is way this is needed, and what exactly this ::<Result<Vec<_>,_>> means and is doing.

Keep in mind I just started learning Rust.


Solution

  • There are a few pieces that come together to mean that the compiler needs a little help:

    collect has a generic type

    We can see from the documentation of Iterator::try that it takes a generic type called B, that that's the type that it returns. It's a bit unusual for a function to return a generic type (more commonly, they accept parameters which have generic types), but returning a generic type can make for some really interesting, versatile APIs, and collect is definitely one of them! The documentation of the collect method talks about this a bit.

    Simplifying your code a little, this means that the compiler can't just look at the code:

    let lines = rdr.lines().collect();
    

    and know what type lines should have. It could be able to infer this (for instance, if you returned lines from the function it could tell that the return type of collect should be exactly the same as the return type of the function, or used it as an argument to another one it could unify those types), but on its own this isn't quite enough context.

    So - what context could the compiler use to work out the type it should be collecting to... Well, the fact that you have a ? gives it a hint, but:

    ? isn't just for Results

    ? can be used for any type which implements the (currently unstable, and probably about to change) Try trait. This means that the compiler can't use the question-mark to infer that the type you're collecting to is a Result - maybe you're collecting to an Option, or some other type that implements Try.

    ? also calls into!

    Worse than that, ? doesn't actually just return the error branch of a Result, it will try to perform a conversion if it needs to (so e.g. if you have a function which results Result<(), Box<String>> you can write Err(String::from("foo"))? and Rust will turn this into something like:

    if let Err(err) = Err(String::from("foo")) {
        return err.into();
    }
    

    Because the Into trait is implemented for these types, a String can be converted to a Box<String> with Into, so the ? operator can save you needing to do the conversion yourself.

    However, this adds a whole level of indirection, where the compiler can't always make some "obvious" inferences because it's possible that the error type collect() returns isn't the same as the return type of the function, but can just be converted to it!

    Type ascription

    One place that you can put this information is you can ascribe the type of a variable. So you could write this code:

    let lines: Result<Vec<_>, _> = rdr.lines().collect();
    let lines = lines?;
    

    (Each of those _s is a place where you're asking the compiler to work out what the type that should be there is from the information it has available to it. In fact, let lines = rdr.lines().collect(); is the same as let lines: _ = rdr.lines().collect(); - if the compiler can work it out, it will, but if it can't, it will give an error saying you need to give it more information).

    In this code snippet we're saying "The return type of collect() needs to be a Result where the Ok type is a Vec of something (compiler, you work it out), and the Error type is, well, compiler please work that out too!

    That gives quite a lot of information to the compiler, and in a lot of cases that's sufficient, though there may be times where we need to fill in more of those _s ourselves.

    Type ascription requires a variable - enter the turbofish

    But the code I wrote above is a bit more verbose than the code you wrote. We have two statements rather than one, we make two variables rather than one. And a lot of the point of the ? operator is to allow us to write concise code!

    The syntax you found, colloquially known as the turbofish (because ::<_> looks kind of funny), is a way to be explicit about what the generic type that the collect function takes is:

    collect::<Result<Vec<_>,_>>()
    

    This is saying: call the function collect, and its generic type should be Result<Vec<_>, _>. Just like when we ascribed the type of a variable, this is a way to give the compiler information about what the generic type of the function should be.

    So putting that all together, we can write:

    let lines = rdr.lines().collect::<Result<Vec<_>,_>>()?;
    

    Which is to say to the compiler: I want you to call collect - the generic type I want collect to return is a Result of a Vec of something (the compiler can work out the rest), and then it knows what you're looking to collect to.