Search code examples
rustborrow-checker

How to avoid reinstantiating regex multiple times?


I have written a solution for advent of code 2023, day 1 in Rust and got it to work. The general structure is like this:

fn parse_number(line: &str) -> Option<i32> {
  Regex::new(...).unwrap()...
}

fn parse_line(line: &str) -> (i32, i32) {
  let first_parsed = parse_number(&line).unwrap();
  let last_parsed = parse_number(&line).unwrap_or(first_parsed);
  (first_parsed, last_parsed)
}

fn day_1(path: &str) -> i32 {
  let file = fs::read_to_string(path).expect("ok");
  file
    .lines()
    .fold(0, |acc, line| {
      ...parse_line(...)...
    })
}

Now what I see as a possible performance optimization would be avoiding to reinitialize the same regexes over and over again, so I want to do something like this:

fn parse_number(re: Regex, line: &str) -> Option<i32> {
  re...
}

fn parse_line(first_regex: Regex, second_regex: Regex, line: &str) -> (i32, i32) {
  let first_parsed = parse_number(first_regex, &line).unwrap();
  let last_parsed = parse_number(second_regex, &line).unwrap_or(first_parsed);
  (first_parsed, last_parsed)
}

fn day_1(path: &str) -> i32 {
  let first_regex = Regex::new(...).unwrap();
  let second_regex = Regex::new(...).unwrap();
  let file = fs::read_to_string(path).expect("ok");
  file
    .lines()
    .fold(0, |acc, line| {
      ...parse_line(&first_regex, &second_regex, &line)...
    })
}

The issue is that the borrow checker complains that I am moving the regex objects (which I obviously am), but what are the common patterns to handle this? I would prefer being able to reuse the first two regex objects for all operations, but a second priority would be cloning them, which I assume is still faster than what I do now.


Solution

  • The methods on Regex take a reference to self as receiver (&self, not self), so you can just swap the owned value for a reference to it instead:

    use regex::Regex;
    fn parse_number(re: &Regex, line: &str) -> Option<i32> {
        todo!()
    }
    

    It's generally a good idea to take a reference instead of something owned when you can get away with it.

    Since the regular expressions are more a property of the parse_line function I'd instead use a lazily initialized static there instead like tadman and Masklinn suggest:

    use std::sync::OnceLock;
    fn parse_line(line: &str) -> (i32, i32) {
        static RE: OnceLock<(Regex, Regex)> = OnceLock::new();
        let (first_regex, last_regex) = RE.get_or_init(|| (Regex::new("").unwrap(), Regex::new("").unwrap()));
    
        let first_parsed = parse_number(first_regex, &line).unwrap();
        let last_parsed = parse_number(last_regex, &line).unwrap_or(first_parsed);
        (first_parsed, last_parsed)
    }