I have a problem with simplexml_load_string
erring with parse errors due to an xml payload coming from a database with unicode characters in it.
I'm at a loss how to get php to read this and use the xml like I normally would. The code has been working fine until people were getting creative with data being submitted.
Unfortunately I cannot modify the source data, I have to work with what I receive, to give you an idea, one field that's breaking it in the original raw receipt looks like :
<FirstName>🐺</FirstName>
Previously the code works fine by parsing the xml with a simple line of :
$xmlresult = simplexml_load_string($result, 'SimpleXMLElement',LIBXML_NOCDATA);
However with these unicode characters, it just errors. Depending on what I use to view the data if I dump the raw payload it can look like:
<d83d><dc3a>
or <U+D83D><U+DC3A>
Reading a bit on stack, it seemed DOM might work but didn't have any luck there either.
The incoming payload does have the header:
?xml version="1.0" encoding="UTF-8"?>
data comes in via
<data type="cdata"><![CDATA[<payload>
I'm at a complete loss, hopefully can get some help here to get me over this hump with this data handling.
I've been staring at this for days and it seems one thing I didn't try was to wrap my curl call function with utf8_encode like this :
$result = utf8_encode(do_curl($xmlbuildquery));
My do_curl function is just a separate function to call the curl procedure, nothing more. Doing that, I'm able to parse the results, instead of those unicode characters showing up, instead its displaying as
[firstname] => í ½í°º
(the above is result of print_r($result); after
$xmldata = simplexml_load_string((string)$xmlresult->body->function->data);
With that in place the xml is now parsing finally. Oddly this sparked my curiosity further as this information is provided via csv thats imported into a mysql database and when I look up the same record its shown as :
FirstName: ????
with the table type set too :
FirstName
varchar(40) COLLATE utf8mb4_unicode_ci NOT NULL,
That might suggest their not utf8_encoding the output to the csv perhaps, separate from this issue but just interesting.
And finally, my script is able to run again!!