Search code examples
jsondelphidelphi-xe6

How to convert TJsonString into string containing JSON?


Give a TJsonString object, how can I get a string representation of the JSON?

Given the following example (valid) JSON:

{
   "comment":"The quick bröwn fox\r\n\tjumped \"over\" the \\lazy\/ dog\r\nthen 💩 on a log."
}

In readable text, the comment would look like:

The quick bröwn fox  
    jumped "over" the \lazy/ dog  
then 💩 on a log.

We parse this JSON into an object:

o: TJsonObject;
o := TJSONObject.ParseJSONValue(TEncoding.UTF8.GetBytes(JsonStr), 0, True) as TJsonObject;

We have a JSON object in memory. Now our goal is to convert it into JSON.

How do I have Delphi XE6 return me the corresponding valid JSON string?

Research Effort

Attempt #1 - TJsonObject.ToString()

The call:

o.ToString

returns the (invalid) JSON:

{"comment":"The quick bröwn fox↵
    jumped \"over\" the \lazy/ dog↵
then 💩 on a log."}

The JSON is invalid because:

  • it isn't escaping CR into \r
  • it isn't escaping LF into \n
  • it isn't escaping \ into \\
  • it isn't escaping / into \/

This makes sense because ToString() is meant to return human-readable text; not valid JSON

Attempt #2 - TJsonObject.ToJson()

While ToString() is meant to return human-readable text, ToJSON() is meant to return valid JSON.

The only problem is that it doesn't exist in Delphi XE6.

Moving on!

Attempt #3 - TJsonObject.ToBytes()

Populate a byte array using the ToBytes() method:

// Allocate a buffer to hold the JSON
buffer: TBytes;
SetLength(buffer, o.EstimatedByteSize);

// Populate the buffer, and size it to its actual length
n: Integer;
n := o.ToBytes(buffer, 0);  // fill the byte array buffer
SetLength(buffer, n);       // size buffer to final size

// Copy byte array to RawByteString so we can see it
s: RawByteString; 
SetLength(s, n);            // size the raw byte string
Move(buffer[0], s[1], n);   // fill the raw byte string

returns the following JSON:

{"comment":"The quick br\u00F6wn fox\r\n\tjumped \"over\" the \\lazy\/ dog\r\nthen \uD83D\uDCA9 on a log."}

While that does technically generate valid JSON, it's needlessly escaping any code point above #127. Not ideal, and not what I wanted, not what I'm asking for, since strings in JSON are allowed to contain "Unicode characters".

And, to fix it all, I'd have to do is: write a JSON parser, and then code to convert JSON to string. Which is the question I'm asking.

Attempt #4 - Fixup ToString()

Knowing that TJsonString.ToString() will return a string that is almost valid JSON:

"The quick bröwn fox#$D#$A
#9'jumped \"over\" the lazy dog#$D#$A
then 💩 on a log."

I can see that it's pretty close to being valid JSON. We just have to apply some JSON escaping rules:

  • #$D -> \r
  • #$A -> \n
  • #$9 -> \t
  • \ -> leave as-is
  • " -> leave as-is
  • / -> leave as-is

But I don't know what happens with other control characters under #32.

Attempt #5 - Access TJsonString's protected TStringBuilder

The object maintains a protected buffer of the actual characters in the json:

  TJSONString = class(TJSONValue)
  protected
    FStrBuffer: TStringBuilder;

And these characters are the actual characters they should be in-memory:

'T', 'h', 'e', ' ', 'q', 'u', 'i', 'c', 'k', ' ', 'b', 'r', 'ö', 'w', 'n', ' ', 'f', 'o', 'x', #$D, #$A, 
#9, 'j', 'u', 'm', 'p', 'e', 'd', ' ', '"', 'o', 'v', 'e', 'r', '"', ' ', 't', 'h', 'e', ' ', '\', 'l', 'a', 'z', 'y', '/', ' ', 'd', 'o', 'g', #$D, #$A, 
't', 'h', 'e', 'n', ' ', #$D83D, '�', ' ', 'o', 'n', ' ', 'a', ' ', 'l', 'o', 'g', '.', #0

So, if I can reach inside the TJsonString, I can suck out the correct characters:

type
  TJsonStringFriend = class(TJsonString);

var
  s: UnicodeString;

s := TJsonStringFriend(AJsonString).FStrBuffer.ToString;

Which returns exactly what a UnicodeString of the string should contain:

The quick bröwn fox'#$D#$A#9'jumped "over" the \lazy/ dog'#$D#$A'then 💩 on a log.

Now, all I have to do is escape the string back into JSON:

var
  i: Integer;
  ch: WideChar;
   
for i := 1 to Length(s) do
begin
  ch := s[i];
  case ch of
    '"': Result := Result + '\"';
    '\': Result := Result + '\\';
    '/': Result := Result + '\/';
    #$8: Result := Result + '\b';
    #$c: Result := Result + '\f';
    #$a: Result := Result + '\n';
    #$d: Result := Result + '\r';
    #$9: Result := Result + '\t';
    else
      if (ch < WideChar(32)) then
        Result := Result + '\u'+IntToHex(Ord(ch), 4)
      else
        Result := Result + ch;
  end;
end;

But, in order to use that, we have to do everything:

function JsonValueToJSON(JsonValue: TJSONValue; Indentation: string=''): UnicodeString;
var
  jsonArray: TJSONArray;
  jsonObject: TJSONObject;
  pair: TJSONPair;
  i: Integer;
  s: UnicodeString;
  ch: WideChar;
begin
  if JsonValue is TJSONArray then
  begin
    jsonArray := JsonValue as TJSONArray;
    Result := '[' + sLineBreak;
    for i := 0 to jsonArray.Count-1 do
    begin
      Result := Result + Indentation + '    ' + JsonValueToJSON(jsonArray.Items[i], Indentation + ' ');
      if i < jsonArray.Count-1 then
        Result := Result + ',';
      Result := Result + sLineBreak;
    end;
    Result := Result + Indentation + ']';
  end
  else if JsonValue is TJSONObject then
  begin
    jsonObject := JsonValue as TJSONObject;
    Result := '{' + sLineBreak;
    for i := 0 to jsonObject.Count-1 do
    begin
      pair := jsonObject.Pairs[i];
      Result := Result + Indentation+'  "'+pair.JsonString.Value+'": '+JsonValueToJSON(pair.JsonValue, Indentation+ '   ');
      if i < jsonObject.Count-1 then
        Result := Result + ',';
      Result := Result + sLineBreak;
    end;
    Result := Result + Indentation + '}';
  end
  else if JsonValue is TJsonString then
  begin
    // Delphi doesn't know how to emit valid JSON; we'll do it ourselves
    Result := '"';
    //s := (JsonValue as TJsonString).ToString;
    s := TJsonStringFriend(JsonValue as TJsonString).FStrBuffer.ToString;
    for i := 1 to Length(s) do
    begin
      ch := s[i];
      case ch of
        '"': Result := Result + '\"';
        '\': Result := Result + '\\';
        '/': Result := Result + '\/';
        #$8: Result := Result + '\b';
        #$c: Result := Result + '\f';
        #$a: Result := Result + '\n';
        #$d: Result := Result + '\r';
        #$9: Result := Result + '\t';
        else
          if (ch < WideChar(32)) then
            Result := Result + '\u'+IntToHex(Ord(ch), 4)
          else
            Result := Result + ch;
      end;
    end;
    Result := Result+'"';
  end
  else
  begin
    // JsonValue is TJSONNumber, TJSONTrue, TJSONFalse, TJSONNull
    // I trust those know how to serialize themselves correctly into JSON
    Result := JsonValue.ToString;
  end;
end;

Except, now I'm in a situation where I hope I got it all correct.

Surely this can't be what's intended? A 6 hour rabbit-hole of the internal minutia of System.JSON, and having to re-invent the wheel.

Summary

How to convert TJsonValue into a JSON string?

Bonus Reading


Solution

  • Solution is to roll your own.

    First is the main function

    class function TJsonHelper.ToJSON(const AJsonValue: TJsonValue): UnicodeString;
    begin
    {
        We have to do this most obvious thing ourselves, since Delphi gets it wrong.
    
        //WRONG: returns a human-readable string, but invalid JSON (e.g. does not convert CRLF into \r\n)
        Result := o.ToString;
    
        //INVALID: not defined in XE6
        Result := o.ToJSON;
    
        //WRONG: encodes everything above 128 into escaped \u0083
        SetLength(buffer, o.EstimatedByteSize);
        n := o.ToBytes(buffer, 0);
        SetLength(buffer, n);
    }
        if AJsonValue = nil then
        begin
            Result := '';
            Exit;
        end;
    
        Result := PrettifyJsonValue(AJsonValue, '');
    end;
    

    Private Helper function

    And then the actual guts are private in a PrettifyJsonValue() helper function:

    type
        // Crack open the JsonString, and feast on the tasty string builder inside.    
        TJsonStringFriend = class(TJsonString) 
        end;
    
    function PrettifyJsonValue(JsonValue: TJSONValue; Indentation: string=''): UnicodeString;
    var
        jsonArray: TJSONArray;
        jsonObject: TJSONObject;
        pair: TJSONPair;
        i: Integer;
        s: UnicodeString;
        ch: WideChar;
        sKey, sValue: UnicodeString;
    begin
        TConstraints.NotNull(JsonValue);
    
        if JsonValue is TJSONArray then
        begin
            jsonArray := JsonValue as TJSONArray;
    
            // Workaround a Delphi System.JSON bug where it cannot parse an empty array such as:
            //  "cities": [ ]
            if jsonArray.Count <= 0 then
            begin
                Result := '[]';
                Exit;
            end;
    
            Result := '['+ sLineBreak;
            for i := 0 to jsonArray.Count-1 do
            begin
                Result := Result + Indentation + '  ' + PrettifyJsonValue(jsonArray.Items[i], Indentation + '   ');
                if i < jsonArray.Count-1 then
                    Result := Result + ',';
                Result := Result + sLineBreak;
            end;
            Result := Result + Indentation + ']';
        end
        else if JsonValue is TJSONObject then
        begin
            jsonObject := JsonValue as TJSONObject;
            Result := '{' + sLineBreak;
            for i := 0 to jsonObject.Count-1 do
            begin
                pair := jsonObject.Pairs[i];
    
                sKey := pair.JsonString.Value;
                sValue := PrettifyJsonValue(pair.JsonValue, Indentation+'   ');
    
                Result := Result + Indentation+'    "'+sKey+'": '+sValue;
                if i < jsonObject.Count-1 then
                    Result := Result + ',';
                Result := Result + sLineBreak;
            end;
            Result := Result + Indentation + '}';
        end
        else if JsonValue.ClassType = TJsonString then //TJsonNumber descends from TJsonString
        begin
            // Delphi doesn't know how to emit valid JSON; we'll do it ourselves
            //SURPRISE: TJsonNumber descends from TJsonString. No, i'm not joking. Not even a little.
            Result := '"';
    //      s := (JsonValue as TJsonString).ToString;
            s := TJsonStringFriend(JsonValue as TJsonString).FStrBuffer.ToString;
            for i := 1 to Length(s) do
            begin
                ch := s[i];
                case ch of
                '"': Result := Result + '\"';
                '\': Result := Result + '\\';
                '/': Result := Result + '\/';
                #$8: Result := Result + '\b';
                #$c: Result := Result + '\f';
                #$a: Result := Result + '\n';
                #$d: Result := Result + '\r';
                #$9: Result := Result + '\t';
                else
                    if (ch < WideChar(32)) then
                        Result := Result + '\u'+IntToHex(Ord(ch), 4)
                    else
                        Result := Result + ch;
                end;
            end;
            Result := Result+'"';
        end
        else
        begin
            // JsonValue is TJSONNumber, TJSONTrue, TJSONFalse, TJSONNull
            // I trust those know how to serialize themselves correctly into JSON
            Result := JsonValue.ToString;
        end;
    end;
    

    Bonus Chatter

    Delphi also has a well-known bug in it's parser, where it is unable to parse valid JSON.

    You need to run your json string through a regex search-replace:

    (\[\[\s\]*\])(?=(?:\[^"\]|"\[^"\]*")*$)[]