Search code examples
rustborrow-checkerownershipborrowing

How to call regexes in a loop without cloning the data


I am writing some code to call an N number of regexes over contents and if possible, I'd like to avoid cloning the strings all the time as not every regex would actually be a match. Is that even possible? My code where I tried to do is this:

use std::borrow::Cow;
use regex::Regex;

fn main() {
    let test = "abcde";
    let regexes = vec![
        (Regex::new("a").unwrap(), "b"),
        (Regex::new("b").unwrap(), "c"),
        (Regex::new("z").unwrap(), "-"),
    ];
    let mut contents = Cow::Borrowed(test);
    
    for (regex, new_value) in regexes {
        contents = regex.replace_all(&contents, new_value);
    }
    println!("{}", contents);
}

The expected result there would be cccde (if it worked) and two clones. But to make it work, I have to keep cloning on every iteration:

fn main() {
    let test = "abcde";
    let regexes = vec![
        (Regex::new("a").unwrap(), "b"),
        (Regex::new("b").unwrap(), "c"),
        (Regex::new("z").unwrap(), "-"),
    ];
    let mut contents = test.to_string();

    for (regex, new_value) in regexes {
        contents = regex.replace_all(&contents, new_value).to_string();
    }
    println!("{}", contents);
}

Which then outputs cccde but with 3 clones. Is it possible to avoid it somehow? I know I could call every regex and rebind the return but I do not have control over the amount of regex that could come. Thanks in advance!

EDIT 1

For those who want to see the real code: It is doing O(n^2) regexes operations. It starts here https://github.com/jaysonsantos/there-i-fixed-it/blob/ad214a27606bc595d80bb7c5968d4f80ac032e65/src/plan/executor.rs#L185-L192 and calls this https://github.com/jaysonsantos/there-i-fixed-it/blob/main/src/plan/mod.rs#L107-L115

EDIT 2 Here is the new code with the accepted answer https://github.com/jaysonsantos/there-i-fixed-it/commit/a4f5916b3e80749de269efa219b0689cb08551f2


Solution

  • You can do it by using a string as the persistent owner of the string as it is being replaced, and on each iteration, checking if the returned Cow is owned. If it is owned, you know the replacement was successful, so you assign the string that is owned by the Cow into the loop variable.

        let mut contents = test.to_owned();
        
        for (regex, new_value) in regexes {
            let new_contents = regex.replace_all(&contents, new_value);
            if let Cow::Owned(new_string) = new_contents {
                contents = new_string;
            }
        }
    

    Note that assignment in Rust is by default a 'move' - this means that the value of new_string is moved rather than copied into contents.

    Playground