Search code examples
jsonunicodeclojureelm

Clojure: Escaping unicode `\U` in JSON encoding


Postamble for future readers

  • Elm allows literal C:\Users\myuser in strings
  • This is consistent with the JSON spec
  • My problem was unrelated to this, but several layers of escaping convoluted the problem. Future lesson: fully producing a minimal working example would have found the error!

Original question

I have a Clojure backend that talks to an Elm frontend. I hit a bump when decoding JSON values in Elm.

\U below means the literal characters backslash and U, as if read from a text file. "\\U" is the same string as input in Clojure and Elm source (\ must be escaped). Note enclosing "".

Problem: encoding \U

The literal string \U, escaped "\\U" is not accepted by the Elm string decoder.

A blog post suggests that to obtain the literal string \U, this should be encoded in source code as "\\\\U", "escaping the unicode escape".

The literal string I want to send to the client is C:\Users\myuser. I prefer to send valid JSON from the server to the client.

Clojure standard library behavior

clojure.data.json does not do anything special for strings containing the literal \U. The example below shows that \U and \m are threated equally, the backslash is escaped, and the following character ignored.

project.core> (clojure.data.json/write-str "C:\\Users\\myuser")
"\"C:\\\\Users\\\\myuser\""

Manual workaround

Temporary workaround is manually escaping the strings I need:

(defn escape-backslash-u [s]
  (clojure.string/replace s "\\U" "\\\\U"))

Concrete questions

  • Is clojure.data.json/write-str behaving correctly? As I understand the documentation, output should be valid unicode.
  • Are other JSON libraries behaving similarly?
  • Is Elm's Json.Decode behaving correctly by rejecting the literal string \U?

Solution progress


Solution

  • I think you may be on the wrong track here.

    The string you gave as an example, "C:\\Users\\myuser" is completely unproblematic, it does not contain any Unicode escape sequences. It is a string containing the ASCII characters ‘C’, ‘:’, ‘\’, ‘U’, and so on. The backslash is the escape character in Clojure strings so it needs to be escaped itself to represent a literal backslash.

    In any case the string "C:\\Users\\myuser" can be serialized with (clojure.data.json/write-str "C:\\Users\\myuser"), and, as you know, this gives "\"C:\\\\Users\\\\myuser\"". All of this seems perfectly straightforward and sound.

    Printing "\"C:\\\\Users\\\\myuser\"" results in the original string "C:\\Users\\myuser" being printed. That string is accepted as valid by JSONLint, again as expected.