Search code examples
rustutf-8ascii

Slicing string with Nordic letters in rust


What I am trying to do is to slice a string that has Nordic letters but it throws this error:

'byte index 1 is not a char boundary; it is inside 'å' (bytes 0..2) of å'

fn main() {
    let str = "äåö".to_string();
    println!("{}", &str[1..]);
}

Solution

  • fn main() {
        let str = "äåö".to_string();
        let slice_position = str.char_indices().nth(1).unwrap().0;
        println!("{}", &str[slice_position..]);
    }
    
    åö
    

    The problem here is that str's indexing is in bytes, but it is UTF-8 encoded and ä takes more than one byte in UTF-8. So slicing at 1 actually cuts off half a character, which is a runtime error in Rust.

    The reason str behaves this way is because you can't actually determine the position of the n-th character without iterating over the entire string. UTF-8 has variable-length characters, meaning, the position of a character depends on the previous characters.