Apologies for the code-bomb, however this question is best illustrated via a minimal example. Consider the following two implementations (new_1
and new_2
) of passing data from behind a struct reference:
#[derive(Debug, Deserialize)]
pub struct PrometheusConfig
{
pub bind: String,
pub port: u16,
}
#[derive(Debug, Deserialize)]
pub struct AppConfig
{
pub prometheus: PrometheusConfig,
}
struct PrometheusEndpoint<'config>
{
config: &'config PrometheusConfig,
metrics_task: tokio::task::JoinHandle<()>,
}
impl<'config> PrometheusEndpoint<'config>
{
// EXAMPLE #1 - THIS FAILS TO COMPILE
fn new_1(config: &'config PrometheusConfig, runtime: std::rc::Rc<tokio::runtime::Runtime>) -> Result<PrometheusEndpoint, i32>
{
// ...
// Start webserver
let metrics_task = runtime.spawn(async move {
warp::serve(metrics_route)
.run((std::net::Ipv4Addr::from_str(config.bind.as_str()).unwrap().octets(), config.port))
.await
});
// ...
}
// EXAMPLE #2 - THIS COMPILES
fn new_2(config: &'config PrometheusConfig, runtime: std::rc::Rc<tokio::runtime::Runtime>) -> Result<PrometheusEndpoint, i32>
{
// ...
// Start webserver
let metrics_task = runtime.spawn(async move {
let bind_octets = std::net::Ipv4Addr::from_str(config.bind.as_str()).unwrap().octets();
let port_copy = config.port;
warp::serve(metrics_route)
.run((bind_octets, port_copy))
.await
});
// ...
}
}
fn main()
{
// Create async runtime for rest of application
let rt = std::rc::Rc::new(Runtime::new().unwrap());
// Parse config file
let config = AppConfig::new(CONFIG_FILE_NAME).unwrap();
// Create Prometheusendpoint to allow remote server to scrape our metrics
let prom = PrometheusEndpoint::new(&config.prometheus, rt.clone()).unwrap();
// ...
}
In both cases, the intent is to pass the host and port into the async runner. Unfortunately, they do not behave the same in this regard; new_2
works while new_1
fails with:
119 | let metrics_task = runtime.spawn(async move {
| ____________________________^
120 | | warp::serve(metrics_route)
121 | | .run((std::net::Ipv4Addr::from_str(config.bind.as_str()).unwrap().octets(), config.port))
122 | | .await
123 | | });
| | ^
| | |
| |__________`config` escapes the associated function body here
| argument requires that `'config` must outlive `'static`
What I can't figure out in all of this is why using an intermediate variable "fixes" this failure. Note that this same error happens for both the bind
and port
variables; I will be focusing on the port
variable as it is a simpler type with fewer operations performed on it.
In both cases, I would expect config.port
to dereference (via the .
operator) config
to get to the underlying u16
which should then be moved into the async
by value. The config
reference isn't used after this so I don't understand how the lifetime of config
is being exposed to the async task.
Some concrete questions from this:
u16
on the stack whose lifetime matters here? Shouldn't the primitive have been moved on creation of the underlying task due to async move
?I got the same error "argument requires that 'config
must outlive 'static
" with your example #2, but by moving the assignments to bind_octets
and port_copy
out of the async-move-block it compiles for me:
fn new_2_fixed(config: &'config PrometheusConfig, runtime: std::rc::Rc<tokio::runtime::Runtime>) -> Result<PrometheusEndpoint, i32>
{
// ...
let bind_octets = std::net::Ipv4Addr::from_str(config.bind.as_str()).unwrap().octets();
let port_copy = config.port;
// Start webserver
let metrics_task = runtime.spawn(async move {
warp::serve(metrics_route)
.run((bind_octets, port_copy))
.await
});
// ...
}
The capture rules for closures and async blocks changed over time. RFC 0231 introduced the capture rules for move
closures (which are reused for async move
) in 2014:
Free variables referenced by a
move ||
closure are always captured by value.
This was amended in 2018 by RFC 2229:
This RFC proposes that closure capturing should be minimal rather than maximal. Conceptually, existing rules regarding borrowing and moving disjoint fields should be applied to capturing.
A capture expression is minimal if it produces a value that is used by the closure in its entirety (e.g. is a primitive, is passed outside the closure, etc.) or if making the expression more precise would require one the following.
- a call to an impure function
- an illegal move (for example, out of a Drop type)
When generating a capture expression, we must decide if the output should be owned or if it can be a reference. […] A move closure will always produce owned data unless the captured binding does not have ownership.
To answer your questions:
1. Shouldn't it be the value of the u16 on the stack whose lifetime matters here? Shouldn't the primitive have been moved on creation of the underlying task due to async move?
Ideally it should be. But unless the minimizing rules from RFC 2229 apply, every variable named inside a move ||
closure or an async move
block is moved/copied into the block.
The async move
block still captures config
, because config
is a reference and, as far as I could check, implicitly copying from a field through a reference doesn't get the capture minimized to the copied field. The reference config
itself is moved/copied into the closure instead of a copy of config.port
.
2. How does using an intermediate variable help here as it seems to just imply another move?
As long as the intermediate variable is outside the async move
block, it helps by decoupling the moved value from the original config
reference. When the implicit copy of port
happens outside the async move
block, then the reference will not be captured by the block.
3. What is the canonical method for passing primitive data by value to another context from behind a struct reference?
As demonstrated above, creating a copy of the primitive data in a let
binding just before the async move
block or move ||
closure works and is quite usual in the Rust ecosystem.
4. Is there a better / canonical method of explicitly specifying the lifetime of the task? In my usage, the config will always outlive the worker tasks by design.
I'm not quite sure if that can be applied here, but "scoped async tasks (analogous to scoped threads) might help. Sadly, because of implementation difficulties and soundness issues these aren't implemented in tokio
, but there are crates on crates.io that try to implement them (try on your own risk).
For non-scoped tasks, the async runtime must assume that the task might outlive any non-static lifetime, so the only way would be to make config
'static
e.g. by putting it into a lazily-initialized global static
or keeping the prometheus config in an Rc
.
In any case, to prevent the problem with unexpected captures you can try to avoid async
blocks where possible:
fn new_3(config: &'config PrometheusConfig, runtime: std::rc::Rc<tokio::runtime::Runtime>) -> Result<PrometheusEndpoint, i32>
{
// ...
// Start webserver
let metrics_task = runtime.spawn(
warp::serve(metrics_route)
.run((
std::net::Ipv4Addr::from_str(config.bind.as_str()).unwrap().octets(),
config.port
))
);
// ...
}