Search code examples
rustasync-awaitbindingreference

Passing Primitive Types Behind a Struct Reference by Value in Rust -or- When is My Variable Bound?


Apologies for the code-bomb, however this question is best illustrated via a minimal example. Consider the following two implementations (new_1 and new_2) of passing data from behind a struct reference:

#[derive(Debug, Deserialize)]
pub struct PrometheusConfig
{
    pub bind: String,
    pub port: u16,
}

#[derive(Debug, Deserialize)]
pub struct AppConfig
{
    pub prometheus: PrometheusConfig,
}

struct PrometheusEndpoint<'config>
{
    config: &'config PrometheusConfig,
    metrics_task: tokio::task::JoinHandle<()>,
}

impl<'config> PrometheusEndpoint<'config>
{
    // EXAMPLE #1 - THIS FAILS TO COMPILE
    fn new_1(config: &'config PrometheusConfig, runtime: std::rc::Rc<tokio::runtime::Runtime>) -> Result<PrometheusEndpoint, i32>
    {
        // ...

        // Start webserver
        let metrics_task = runtime.spawn(async move {
            warp::serve(metrics_route)
            .run((std::net::Ipv4Addr::from_str(config.bind.as_str()).unwrap().octets(), config.port))
            .await
        });

        // ...
    }

    // EXAMPLE #2 - THIS COMPILES
    fn new_2(config: &'config PrometheusConfig, runtime: std::rc::Rc<tokio::runtime::Runtime>) -> Result<PrometheusEndpoint, i32>
    {
        // ...

        // Start webserver
        let metrics_task = runtime.spawn(async move {
            let bind_octets = std::net::Ipv4Addr::from_str(config.bind.as_str()).unwrap().octets();
            let port_copy = config.port;
            warp::serve(metrics_route)
            .run((bind_octets, port_copy))
            .await
        });

        // ...
    }
}

fn main()
{
    // Create async runtime for rest of application
    let rt  = std::rc::Rc::new(Runtime::new().unwrap());

    // Parse config file
    let config = AppConfig::new(CONFIG_FILE_NAME).unwrap();

    // Create Prometheusendpoint to allow remote server to scrape our metrics
    let prom = PrometheusEndpoint::new(&config.prometheus, rt.clone()).unwrap();

    // ...
}

In both cases, the intent is to pass the host and port into the async runner. Unfortunately, they do not behave the same in this regard; new_2 works while new_1 fails with:

119 |           let metrics_task = runtime.spawn(async move {
    |  ____________________________^
120 | |             warp::serve(metrics_route)
121 | |             .run((std::net::Ipv4Addr::from_str(config.bind.as_str()).unwrap().octets(), config.port))
122 | |             .await
123 | |         });
    | |          ^
    | |          |
    | |__________`config` escapes the associated function body here
    |            argument requires that `'config` must outlive `'static`

What I can't figure out in all of this is why using an intermediate variable "fixes" this failure. Note that this same error happens for both the bind and port variables; I will be focusing on the port variable as it is a simpler type with fewer operations performed on it.

In both cases, I would expect config.port to dereference (via the . operator) config to get to the underlying u16 which should then be moved into the async by value. The config reference isn't used after this so I don't understand how the lifetime of config is being exposed to the async task.

Some concrete questions from this:

  1. Shouldn't it be the value of the u16 on the stack whose lifetime matters here? Shouldn't the primitive have been moved on creation of the underlying task due to async move?
  2. How does using an intermediate variable help here as it seems to just imply another move?
  3. What is the canonical method for passing primitive data by value to another context from behind a struct reference?
  4. Is there a better / canonical method of explicitly specifying the lifetime of the task? In my usage, the config will always outlive the worker tasks by design.

Solution

  • I got the same error "argument requires that 'config must outlive 'static" with your example #2, but by moving the assignments to bind_octets and port_copy out of the async-move-block it compiles for me:

        fn new_2_fixed(config: &'config PrometheusConfig, runtime: std::rc::Rc<tokio::runtime::Runtime>) -> Result<PrometheusEndpoint, i32>
        {
            // ...
    
            let bind_octets = std::net::Ipv4Addr::from_str(config.bind.as_str()).unwrap().octets();
            let port_copy = config.port;
    
            // Start webserver
            let metrics_task = runtime.spawn(async move {
                warp::serve(metrics_route)
                .run((bind_octets, port_copy))
                .await
            });
    
            // ...
        }
    

    The capture rules for closures and async blocks changed over time. RFC 0231 introduced the capture rules for move closures (which are reused for async move) in 2014:

    Free variables referenced by a move || closure are always captured by value.

    This was amended in 2018 by RFC 2229:

    This RFC proposes that closure capturing should be minimal rather than maximal. Conceptually, existing rules regarding borrowing and moving disjoint fields should be applied to capturing.

    A capture expression is minimal if it produces a value that is used by the closure in its entirety (e.g. is a primitive, is passed outside the closure, etc.) or if making the expression more precise would require one the following.

    • a call to an impure function
    • an illegal move (for example, out of a Drop type)

    When generating a capture expression, we must decide if the output should be owned or if it can be a reference. […] A move closure will always produce owned data unless the captured binding does not have ownership.

    To answer your questions:

    1. Shouldn't it be the value of the u16 on the stack whose lifetime matters here? Shouldn't the primitive have been moved on creation of the underlying task due to async move?

    Ideally it should be. But unless the minimizing rules from RFC 2229 apply, every variable named inside a move || closure or an async move block is moved/copied into the block.

    The async move block still captures config, because config is a reference and, as far as I could check, implicitly copying from a field through a reference doesn't get the capture minimized to the copied field. The reference config itself is moved/copied into the closure instead of a copy of config.port.

    2. How does using an intermediate variable help here as it seems to just imply another move?

    As long as the intermediate variable is outside the async move block, it helps by decoupling the moved value from the original config reference. When the implicit copy of port happens outside the async move block, then the reference will not be captured by the block.

    3. What is the canonical method for passing primitive data by value to another context from behind a struct reference?

    As demonstrated above, creating a copy of the primitive data in a let binding just before the async move block or move || closure works and is quite usual in the Rust ecosystem.

    4. Is there a better / canonical method of explicitly specifying the lifetime of the task? In my usage, the config will always outlive the worker tasks by design.

    I'm not quite sure if that can be applied here, but "scoped async tasks (analogous to scoped threads) might help. Sadly, because of implementation difficulties and soundness issues these aren't implemented in tokio, but there are crates on crates.io that try to implement them (try on your own risk).

    For non-scoped tasks, the async runtime must assume that the task might outlive any non-static lifetime, so the only way would be to make config 'static e.g. by putting it into a lazily-initialized global static or keeping the prometheus config in an Rc.

    In any case, to prevent the problem with unexpected captures you can try to avoid async blocks where possible:

        fn new_3(config: &'config PrometheusConfig, runtime: std::rc::Rc<tokio::runtime::Runtime>) -> Result<PrometheusEndpoint, i32>
        {
            // ...
    
            // Start webserver
            let metrics_task = runtime.spawn(
                warp::serve(metrics_route)
                .run((
                      std::net::Ipv4Addr::from_str(config.bind.as_str()).unwrap().octets(),
                      config.port
                ))
            );
    
            // ...
        }