I have run into what seems a very weird issue.
To begin with, it happens only on Ubuntu 18.04 that runs on VMWare on top of Windows 10 host.
I am sending a multipart request using CURL, and once the size of the data passed to curl_mime_data exceeds a certain amount, it is being corrupted in a way that a certain offset is being overwritten with the beginning, for example, 'abcdefgh' would become 'abcabcabc'.
I see this corrupted data on both the receiving side and in Wireshark on the sender, so it is indeed being sent this way. I managed to reproduce the behavior using a toy program below:
#include <fstream>
#include <iostream>
#include <memory>
#include <stdlib.h>
#include <sys/wait.h>
#include <unistd.h>
#include <assert.h>
#include <curl/curl.h>
int main(int argc, char **argv)
{
curl_global_init(0);
CURL *curl = curl_easy_init();
curl_mime *form = nullptr;
curl_mimepart *field = nullptr;
CURLcode res = CURLE_OK;
form = curl_mime_init(curl);
std::ifstream t("test.json");
if (!t.good())
{
return 1;
}
std::string str;
t.seekg(0, std::ios::end);
str.reserve(t.tellg());
t.seekg(0, std::ios::beg);
str.assign(std::istreambuf_iterator<char>(t), std::istreambuf_iterator<char>());
size_t totalsize = 0;
{
field = curl_mime_addpart(form);
res = curl_mime_data(field, (const char *)str.c_str(), str.size());
{
FILE *f = fopen("out.json", "w");
fprintf(f, "%s", str.c_str());
fclose(f);
}
res = curl_mime_name(field, "response");
assert(res == CURLE_OK);
}
curl_easy_setopt(curl, CURLOPT_TIMEOUT, 10);
curl_easy_setopt(curl, CURLOPT_URL, "http://1.2.3.4");
curl_easy_setopt(curl, CURLOPT_MIMEPOST, form);
res = curl_easy_perform(curl);
assert(res == CURLE_OK);
curl_mime_free(form);
curl_easy_cleanup(curl);
curl_global_cleanup();
return 0;
}
Here I'm reading a sample file to send from disk and flushing it back just before passing to curl for comparison. At this point, there is no difference, but the value that is being sent differs from the contents of the original file.
As mentioned in the beginning, this only happens in a very specific environment which is Ubuntu 18.04 on VMWare Workstation 12 on Windows 10. Two different instances of the said ubuntu for that matter. This does NOT happen when running on:
I'm kind of out of ideas where to look or what can possibly be causing this. I'm using libcurl 7.56 and gcc 7.3.0
Could you please throw some ideas at me? Am I using libcurl wrong? What else could be wrong or worth trying?
Just tested it on Ubuntu 16.04 and the data was corrupted too. As for the data I'm using, here's the original json file https://www.dropbox.com/s/gfk0b61tyel68wu/test.json?dl=0 and what I'm getting from Wireshark https://www.dropbox.com/s/dwlzwhn755c51cf/bad.json?dl=0 The problem starts at line 779. (this is a random json created by json generator, not a real data)
This appears to be a problem of curl 7.56. Looks like, whatever this was, it is already fixed in curl 7.61.1 (the latest at this time). Even further investigation shows that the exact commit where it was fixed is 5f9e2ca09b57d82baf239039835b3b06dc41bbc5
As I found out, this particular bug was fixed in this commit "mime: fix the content reader to handle >16K data properly"