Search code examples
stringrustreplacewhitespace

What's the ideal way to trim extra spaces from a string?


I'm dealing with strings where I need to replace multiple spaces with just a single space . It looks like most of these are just human error, but am curious on the ideal way to handle this -- preferrably with the least allocations from &str to String.

So far this is my approach below:

const SPACE: &str = " ";
const TWO_SPACES: &str = "  ";

/// Replace multiple spaces with a single space
pub fn trim_whitespace(s: &str) -> String {
    let mut new_str: String = s.trim().to_owned();
    while new_str.contains(TWO_SPACES) {
        new_str = new_str.replace(TWO_SPACES, SPACE);
    }
    new_str
}

let result = trim_whitespace("Hello     world! ");
assert_eq!(result, "Hello world!");

Edit (10/2022): I come from a background in Python, where doing something like the above is quite idiomatic. For example, the fastest version in Python (to replace multiple spaces with a single space) appears to be this:

def trim_whitespace(s: str) -> str:
    s = s.strip()
    while '  ' in s:
        s = s.replace('  ', ' ')
    return s

Solution

  • split_whitespace() is very convenient for this usage.

    A vector and a string are allocated in the very simple first solution.

    The second solution allocates only a string, but is a bit inelegant (an if at each iteration).

    pub fn trim_whitespace_v1(s: &str) -> String {
        // first attempt: allocates a vector and a string
        let words: Vec<_> = s.split_whitespace().collect();
        words.join(" ")
    }
    
    pub fn trim_whitespace_v2(s: &str) -> String {
        // second attempt: only allocate a string
        let mut result = String::with_capacity(s.len());
        s.split_whitespace().for_each(|w| {
            if !result.is_empty() {
                result.push(' ');
            }
            result.push_str(w);
        });
        result
    }
    
    fn main() {
        let source = "  a   bb cc   ddd    ";
        println!("{:?}", trim_whitespace_v1(source)); // "a bb cc ddd"
        println!("{:?}", trim_whitespace_v2(source)); // "a bb cc ddd"
    }