Search code examples
multithreadingconcurrencystaticrustlazy-initialization

Share static lazy-initialized object containing `Rc` refs among multiple threads with `thread_local!` and `OnceCell`


I have split my tests into several, similar sections. Within each section, results are compared against a static test string, written in a dedicated tested language (here called dum) and parsed with pest.


Here is the global structure of my MWE.

$ tree
.
├── Cargo.lock
├── Cargo.toml
├── src
│   └── main.rs
└── tests
    ├── dum.pest
    ├── section_1.rs
    └ .. imagine more similar sections here.
  • Cargo.toml
[package]
...
edition = "2018"

[dev-dependencies]
pest = "*"
pest_derive = "*"
once_cell = "*"
lazy_static = "*"

  • main.rs only contains fn main() {}.
  • dum.pest is a dummy any = { ANY* }.
  • section_1.rs preamble is:
use pest_derive::Parser;
use pest::{iterators::Pairs, Parser};

// Compile dedicated grammar.
#[derive(Parser)]
#[grammar = "../tests/dum.pest"]
pub struct DumParser;

// Here is the static test string to run section 1 against.
static SECTION_1: &'static str = "Content to parse for section 1.";

// Type of the result expected to be globally available in the whole test section.
type ParseResult = Pairs<'static, Rule>;

Now, my first naive attempt to make the parse result available to all test function was:

// Naive lazy_static! attempt:
use lazy_static::lazy_static;
lazy_static! {
    static ref PARSED: ParseResult = {
        DumParser::parse(Rule::any, &*SECTION_1).expect("Parse failed.")
    };
}
#[test]
fn first() {
    println!("1: {:?} parsed to {:?}", &*SECTION_1, *PARSED);
}
#[test]
fn second() {
    println!("2: {:?} parsed to {:?}", &*SECTION_1, *PARSED);
}

This does not compile. According to pest, it's because they use inner Rc references that cannot be safely shared among threads, and I think cargo test does spin a new thread for each #[test] function.

The suggested solution involves the use of thread_local! and OnceCell, but I cannot figure it out. The following two attempts:

// Naive thread_local! attempt:
thread_local! {
    static PARSED: ParseResult = {
        println!(" + + + + + + + PARSING! + + + + + + + "); // /!\ SHOULD APPEAR ONLY ONCE!
        DumParser::parse(Rule::any, &*SECTION_1).expect("Parse failed.")
    };
}
#[test]
fn first() {
    PARSED.with(|p| println!("1: {:?} parsed to {:?}", &*SECTION_1, p));
}
#[test]
fn second() {
    PARSED.with(|p| println!("2: {:?} parsed to {:?}", &*SECTION_1, p));
}

and

// Naive OnceCell attempt:
use once_cell::sync::OnceCell;
thread_local! {
static PARSED: OnceCell<ParseResult> = {
    println!(" + + + + + + + PARSING! + + + + + + + "); // /!\ SHOULD APPEAR ONLY ONCE!
        let once = OnceCell::new();
        once.set(DumParser::parse(Rule::any, &*SECTION_1).expect("Parse failed."))
        .expect("Already set.");
        once
    };
}
#[test]
fn first() {
    PARSED.with(|p| println!("1: {:?} parsed_to {:?}", &*SECTION_1, p.get().unwrap()));
}
#[test]
fn second() {
    PARSED.with(|p| println!("2: {:?} parsed_to {:?}", &*SECTION_1, p.get().unwrap()));
}

Both compile and run fine. But the output of cargo test -- --nocapture suggests that the parsing is actually done once for each test function:

running 2 tests
 + + + + + + + PARSING! + + + + + + +
 + + + + + + + PARSING! + + + + + + +
1: "Content to parse for section 1." parsed_to [Pair { rule: any, span: Span { str: "Content to parse for section 1.", start: 0, end: 31 }, inner: [] }]
2: "Content to parse for section 1." parsed_to [Pair { rule: any, span: Span { str: "Content to parse for section 1.", start: 0, end: 31 }, inner: [] }]
test first ... ok
test second ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

This reveals that I have failed in both my attempts.

What is wrong with these approaches?
How do I make the parsing occur only once per section?


Solution

  • Why isn't lazy_static! suitable?

    Whether cargo test spins up a new thread per test or not is actually irrelevant.

    A static variable is global, and thus potentially shared between threads, thus even if no thread is ever spawned, it must be Sync.

    And since Rc is not Sync (cannot be shared between threads), this cannot work.

    Why isn't thread_local! suitable?

    There is one thread_local! variable per thread, as the name suggests.

    The code within thread_local! is actually not run immediately upon thread-creation; as the variable is lazily instantiated on first access.

    How do I make parsing occur only once per section?

    Don't use the output of pest directly.

    If you post-process the output of pest and create a structure that is Sync out of it, then you can store it with lazy_static and it will only be parsed once.

    Actually, you could go further and avoid lazy_static entirely. If you can express the structure in a purely const way, then you could use a build.rs script or procedural macro to transform the string into a model at compile-time. For tests, though, this may not be worth the effort.