I'm dealing with strings where I need to replace multiple spaces
with just a single space
. It looks like most of these are just human error, but am curious on the ideal way to handle this -- preferrably with the least allocations from &str
to String
.
So far this is my approach below:
const SPACE: &str = " ";
const TWO_SPACES: &str = " ";
/// Replace multiple spaces with a single space
pub fn trim_whitespace(s: &str) -> String {
let mut new_str: String = s.trim().to_owned();
while new_str.contains(TWO_SPACES) {
new_str = new_str.replace(TWO_SPACES, SPACE);
}
new_str
}
let result = trim_whitespace("Hello world! ");
assert_eq!(result, "Hello world!");
Edit (10/2022): I come from a background in Python, where doing something like the above is quite idiomatic. For example, the fastest version in Python (to replace multiple spaces with a single space) appears to be this:
def trim_whitespace(s: str) -> str:
s = s.strip()
while ' ' in s:
s = s.replace(' ', ' ')
return s
split_whitespace()
is very convenient for this usage.
A vector and a string are allocated in the very simple first solution.
The second solution allocates only a string, but is a bit inelegant (an if
at each iteration).
pub fn trim_whitespace_v1(s: &str) -> String {
// first attempt: allocates a vector and a string
let words: Vec<_> = s.split_whitespace().collect();
words.join(" ")
}
pub fn trim_whitespace_v2(s: &str) -> String {
// second attempt: only allocate a string
let mut result = String::with_capacity(s.len());
s.split_whitespace().for_each(|w| {
if !result.is_empty() {
result.push(' ');
}
result.push_str(w);
});
result
}
fn main() {
let source = " a bb cc ddd ";
println!("{:?}", trim_whitespace_v1(source)); // "a bb cc ddd"
println!("{:?}", trim_whitespace_v2(source)); // "a bb cc ddd"
}