Search code examples
stringgorangerune

Is there any difference between range over string and range over rune slice?


Ranging over string

func main() {
    str := "123456"
    for _, s := range str {
        fmt.Printf("type of v: %s, value: %v, string v: %s \n", reflect.TypeOf(s), s, string(s))
    }
}

https://play.golang.org/p/I1JCUJnN41h

And ranging over rune slice ([]rune(str))

func main() {
    str := "123456"
    for _, s := range []rune(str) {
        fmt.Printf("type : %s, value: %v ,string : %s\n", reflect.TypeOf(s), s, string(s))
    }
}

https://play.golang.org/p/rJvyHH6lkl_t

I got the same results, are they the same?


Solution

  • Yes there is a difference. Given

    for i, c := range v {
    

    c will be the same whether v is a string or a rune slice, but i will vary if the string contains multibyte characters.

    String Indexing

    Strings are sequences of bytes and indexing is appropriate to a slice of bytes. Unless you are intentionally reading or manipulating bytes instead of code points or characters, or are sure your input contains no multibyte characters, wherever you are inclined to index a string you should use a rune slice instead.

    Range Loops are Special

    for i, c := range str {
    

    Range loops over strings are special. Instead of treating the string simply as a slice of bytes, range treats the string partly like a slice of bytes and partly like a slice of runes.

    The i will be the byte index of the beginning of the code point. The c will be a rune that can contain more than one byte. This means i can increase by more than one in an iteration because the prior code point was a multibyte character.

    Besides the axiomatic detail that Go source code is UTF-8, there's really only one way that Go treats UTF-8 specially, and that is when using a for range loop on a string. We've seen what happens with a regular for loop. A for range loop, by contrast, decodes one UTF-8-encoded rune on each iteration. Each time around the loop, the index of the loop is the starting position of the current rune, measured in bytes, and the code point is its value.

    See more in the official Go Blog post the above is excerpted from: Strings, bytes, runes and characters in Go