Search code examples
jsongostructunicode

Cannot use some unicode character as a struct tag


I'm trying to build a program that extract data from a JSON and put it into a custom struct. The JSON contains keys like "foo\u00a0", so I have to use tags to get these values.

I have this code:

package main

import (
    "encoding/json"
    "fmt"
)

type MyStruct struct {
    X string `json:"foobar\u0062"`
    Y string `json:"foobaz\u00a0"`
}

func main() {
    data := []byte(`{"foobar\u0062": "Bar", "foobaz\u00a0": "Baz"}`)

    var ms MyStruct
    err := json.Unmarshal(data, &ms)
    if err != nil {
        panic(err)
    }

    fmt.Printf("First: %s\n", ms.X)
    fmt.Printf("Second: %s\n", ms.Y)

}

But it prints:

First: Bar
Second: 

It does not print the second value.

I tested it with different value from Latin 1 supplement and apparently,

  • it works with 00b5, 00f9
  • not with 00a1, 00a2, 00ab, 00af, 00b0

My questions:

  1. Why can the string foobar\u0062 be used as a tag but not foobaz\u00a0 ?
  2. If it's not possible, then how can I get the value of a keys in the format of foobar\u00a0 in a JSON ?

Solution

  • The struct tag allows including such special characters like \u00a0, see this example to prove it:

    type MyStruct struct {
        X string `json:"foobar\u0062"`
        Y string `json:"foobaz\u00a0"`
    }
    
    u := MyStruct{}
    t := reflect.TypeOf(u)
    
    for _, fieldName := range []string{"X", "Y"} {
        field, found := t.FieldByName(fieldName)
        if !found {
            continue
        }
        fmt.Printf("\nField: %s\n", fieldName)
        fmt.Printf("\tWhole tag value : %s\n", field.Tag)
        fmt.Printf("\tValue of 'json': %q\n", field.Tag.Get("json"))
    }
    

    This outputs (try it on the Go Playground):

    Field: X
        Whole tag value : json:"foobar\u0062"
        Value of 'json': "foobarb"
    
    Field: Y
        Whole tag value : json:"foobaz\u00a0"
        Value of 'json': "foobaz\u00a0"
    

    But the encoding/json package is more strict and it does not allow such characters. The restriction is in encoding/json/encode.go:

    func isValidTag(s string) bool {
        if s == "" {
            return false
        }
        for _, c := range s {
            switch {
            case strings.ContainsRune("!#$%&()*+-./:;<=>?@[]^_{|}~ ", c):
                // Backslash and quote chars are reserved, but
                // otherwise any punctuation chars are allowed
                // in a tag name.
            case !unicode.IsLetter(c) && !unicode.IsDigit(c):
                return false
            }
        }
        return true
    }
    

    So the json tag value of "foobar\u0062" is valid because '\u0062' is simply the 'b' character which is allowed.

    And a json tag value of "foobaz\u00a0" is deemed invalid ('\u00a0' is not accepted by isValidTag()) and will not be unmarshaled. This restriction is historical and was added so that a json key can also be used for other purposes, such as protobuf keys.

    If you want to unmarshal such input JSON using the encoding/json standard lib package, you can't use struct tags. Use a map for example:

    data := []byte(`{"foobar\u0062": "Bar", "foobaz\u00a0": "Baz"}`)
    
    var m map[string]any
    err := json.Unmarshal(data, &m)
    if err != nil {
        panic(err)
    }
    fmt.Println("X:", m["foobar\u0062"])
    fmt.Println("Y:", m["foobaz\u00a0"])
    

    This will output (try it on the Go Playground):

    X: Bar
    Y: Baz