Search code examples
phpxmlserverhtml-escape-characters

Escaping ampersand programatically in xml does not seem to work. Payload cuts off


I have an arduino board from which I am trying to post xml data to a server. I am having issues with & in xml, so first I tested using a browser. I create an html form, submit data, capture headers on server, and try to replicate it on the board.

html form.

<form id= "pData" action="#" method="post" > 
    <textarea  name="postData" ></textarea>
    <input type='submit' value=' Go '/> 
</form>

Capturing Data from php server

if(isset($_POST['postData'])){
    $input_headers="";
    $file = 'xmlErrors.txt';
    foreach ($_SERVER as $name => $value) {
        $input_headers.= "$name: $value\n";
    }

    $settings=$input_headers."\r\n".$_POST['postData'];
    file_put_contents($file,$settings, FILE_APPEND );
}

This is how I send it from the board. And I print data on serial monitor to make sure it looks good.

String request = "POST /test.php HTTP/1.1\r\nHost: example.com\r\nAccept-Encoding: gzip, deflate \r\nCache-Control:max-age=0\r\nUser-Agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36\r\nAccept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9\r\nAccept-Language:en-US,en;q=0.9,pl;q=0.8\r\nReferer:example.com\r\nContent-Type: application/x-www-form-urlencoded\r\n";
String payload = "<?xml version='1.0' encoding='UTF-8'?><response>a=1&amp;b=2</response>";
//Serial.print("length: ");Serial.println(payload.length());
request += "Content-Length:" + String(payload.length() + 15) + "\r\nConnection: Close\r\n\r\postData=" + payload;
client.print(request);

When I test it from the browser, data is captured correctly. But when I send it from the board, it gets cut off at

<?xml version='1.0' encoding='UTF-8'?><response>a=1

I tried altering the data payload like

<?xml version='1.0' encoding='UTF-8'?><response>a=1&#038;b=2</response>

No matter what I try, it just gets cut off. And I matched pretty much every http header that made sense to me from the board, still to no avail. Without the &, I get the entire data on the server from the board, so I believe the rest of the code on the board is doing its job.

Any clues?


Solution

  • For those who remotely end up in this corner of issues, here is what happened.

    The web browser is is submitting data with

    CONTENT_TYPE: application/x-www-form-urlencoded
    

    This means & gets turned into %26 and that is how it passes through. On the board, though the header is present, & is not urlencoded, rather xml escaped. Changing & to %26 will get you through the $_POST part. After you get the data, you have to xml escape your & with &amp; to make it pass xml parsing.

    Thanks to Wireshark :)