I know this is kinda weird thing to do, but it is part if an integration test, where I am testing whether system does not get tricked by \0
, sent as URL encoded %%0
. The system is indeed not tricked and echoes the value back as an XML error. But my test code fails then, trying to parse it. This is IMO valid XML:
<Error>
<Code>1234</Code>
<Message>Test�afterzero</Message>
</Error>
But this code would have an error, instead of a string with \0
in it:
package main
import (
"encoding/xml"
"fmt"
)
type ErrorMsg struct {
XMLName xml.Name `xml:"Error"`
Code string `xml:"Code"`
Message string `xml:"Message"`
}
func main() {
var data = []byte(`
<Error>
<Code>1234</Code>
<Message>Test�afterzero</Message>
</Error>
`)
b := ErrorMsg{}
o := xml.Unmarshal([]byte(data), &b)
fmt.Println(o)
fmt.Println(b)
}
See the example here: https://go.dev/play/p/1fKyuzZPUM8
Can I do something? I suppose I could just scrap this test - I did check it doesn't crash the system and it's really weird corner case, but I kinda would like to keep it.
Can I tell the XML to accept this somehow?
This is IMO valid XML
That is NOT valid XML. The �
character is not a valid XML character and cannot be in an XML document. Any XML parser will fail to parse.
Refer to the specs: https://www.w3.org/TR/xml/#charsets
Character Range
[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */
That character is not allowable in XML. So, you may need to find a different way to indicate what the character is if it's outside the allowable range.