Search code examples
stringgounicodeunicode-escapes

How to convert "\\u0301" to "\u0301"


The problem that I met is that Given a string contains several special substrings like: "\\u0301" "\\b" "\\t",..etc

Please convert all special strings above into "\u0301","\b","\t", ..etc

Note that if just removing a backslash here, you will get a plaintext of "\u0301", instead an accent mark of Unicode.

A one-by-one solution is to replace each special string

str = strings.Replace(str, "\\u0301","\u0301", -1)

Is a general solution to all escape character codes?


Solution

  • If you need to convert a byte sequence that contains escaped Unicode sequences and control sequences to a Unicode string, then you can use strconv.Unquote function.

    To do this, convert the bytes to a string, escape the double-quote and newline characters, and add double-quote characters at the beginning and end of this string.

    package main
    
    import (
        "fmt"
        "strconv"
        "strings"
    )
    
    func main() {
        b := []byte{65, 92, 117, 48, 51, 48, 49, 9, 88, 10, 34, 65, 34}
        // the same as
        // b := []byte("A\\u0301\tX\n\"A\"")
    
        // convert to string
        s := string(b)
    
        fmt.Println(s)
        fmt.Println("=========")
    
        // escape double quotes
        s = strings.ReplaceAll(s, "\"", "\\\"")
        // escape newlines
        s = strings.ReplaceAll(s, "\n", "\\n")
    
        r, err := strconv.Unquote(`"` + s + `"`)
        if err != nil {
            panic(err)
        }
        fmt.Println(r)
        fmt.Println("=========")
    }
    

    Output:

    A\u0301 X
    "A"
    =========
    Á  X
    "A"
    =========
    

    https://go.dev/play/p/WRsNGOT1zLR