Search code examples
phpjson

How to tell json_encode to encode angle braces (e.g., < to \u003C)?


I'm working with a JSON string that contains Unicode-escaped characters, such as \u003C, and I want to decode the JSON into an associative array without converting these Unicode characters to their literal equivalents (e.g., \u003C should stay as it is and not convert to <).

Here’s a simplified example of my JSON and code:

 // Original JSON string
    $json = '{"Code_Injecting":{"Code":"\u003Cstyle\u003E div#example { display: none; }\u003C/style\u003E"}, "RemoveKey":"removeValue"}';
   

// Step 1: Decode JSON to an array
$array = json_decode($json, true);

// Check if decoding was successful
if (json_last_error() !== JSON_ERROR_NONE) {
    die('Error decoding JSON: ' . json_last_error_msg());
}

// Step 2: Unset the key "Code_Injecting" if it exists
if (isset($array['Code_Injecting'])) {
    unset($array['Code_Injecting']);
}

// Step 3: Encode it back to JSON while preserving Unicode-escaped characters
$newJson = json_encode($array, JSON_UNESCAPED_SLASHES);

// Output the final JSON
echo $newJson;

// Expected Output:
// {"Code_Injecting":{"Code":"\u003Cstyle\u003E div#example { display: none; }\u003C/style\u003E"}}

I need the Unicode-escaped characters (like \u003C) to remain unchanged when decoding JSON. Is there a way to achieve this in PHP?

I’ve tried looking into JSON_UNESCAPED_UNICODE during encoding but didn’t find an equivalent for decoding. Any help or suggestions would be appreciated!


Solution

  • If you only needed to have the < and > encoded again, setting the flag JSON_HEX_TAG while re-encoding the data, would do.

    If you want something that leaves all such unicode escape sequences in place ... then you need to replace each \uXXXX with \\u0075XXXX first, then manipulate and re-encode your data - and then replace \u0075 with just u again at the end:

    $json = preg_replace('#\\\\u([0-9A-F]{4})#', '\\\\\u0075$1', $json);
    $dec = json_decode($json, true);
    $enc = preg_replace('#\\\\u0075#', 'u', json_encode($dec));
    

    If you do it like this, then you of course got to be aware, that your data now contains literal \uXXXX sequences. Meaning, if you wanted to replace < in one of the object's string values now, you can't search for <, but would need to search for \u003C.