Search code examples
cnewlinelibcurlinfluxdbbatching

Improve points per seconds writing with InfluxDB


I am trying to improve writing performance between a C client program and a single node of InfluxDB.

Currently my record is 2.526K writes per seconds, as seen in the screenshot below:

My C program is basically an infinite loop that produces HTTP POST requests with the use of libcurl.

Here is the code responsible for the POST requests:

int configure_curl_easy_operation(CURL *curl_easy_handler)
{
  // using this doc page https://curl.haxx.se/libcurl/c/curl_easy_setopt.html
  // behavior options
  curl_easy_setopt(curl_easy_handler, CURLOPT_VERBOSE, 1L);

  // callback options

  // error options

  // network options
  //curl_easy_setopt(curl_easy_handler, CURLOPT_URL, "http://localhost:8086/ping"); an old test
  curl_easy_setopt(curl_easy_handler, CURLOPT_URL, "http://localhost:8086/write?db=XXX_metrics");
  curl_easy_setopt(curl_easy_handler, CURLOPT_HTTP_CONTENT_DECODING, 0L);
  curl_easy_setopt(curl_easy_handler, CURLOPT_TRANSFER_ENCODING, 0L);
  //curl_easy_setopt(curl_easy_handler, CURLOPT_HTTPHEADER, )// work here
  curl_easy_setopt(curl_easy_handler, CURLOPT_PROTOCOLS, CURLPROTO_HTTP);
  curl_easy_setopt(curl_easy_handler, CURLOPT_POST, 1L);
  curl_easy_setopt(curl_easy_handler, CURLOPT_REDIR_PROTOCOLS, 0L);
  curl_easy_setopt(curl_easy_handler, CURLOPT_DEFAULT_PROTOCOL, "http");
  curl_easy_setopt(curl_easy_handler, CURLOPT_FOLLOWLOCATION, 0L);
  //curl_easy_setopt(curl_easy_handler, CURLOPT_HTTPHEADER, NULL);

  // NAMES and PASSWORDS OPTIONS

  // HTTP OPTIONS
  // curl_easy_setopt(curl_easy_handler, CURLOPT_HTTPGET, 0L);

  // SMTP OPTIONS

  // TFTP OPTIONS

  // FTP OPTIONS

  // RTSP OPTIONS

  // PROTOCOL OPTIONS

  if (curl_easy_setopt(curl_easy_handler, CURLOPT_POSTFIELDS, "metrics value0=0,value1=872323,value2=928323,value3=238233,value4=3982332,value5=209233,value6=8732632,value7=4342421,value8=091092744,value9=230944\nmetrics value10=0,value11=872323,value12=928323,value13=238233,value14=3982332,value15=209233,value16=8732632,value17=4342421,value18=091092744,value19=230944") != CURLE_OK)
    return (1);
  //curl_easy_setopt(curl_easy_handler, CURLOPT_MIMEPOST, mime);

  // CONNECTION OPTIONS

  // SSL and SECURITY OPTIONS

  // SSH OPTIONS

  // OTHER OPTIONS

  // TELNET OPTIONS
  return (0);
}
int do_things(t_contexts_handlers *ctxts_handlers)
{
  while (g_running)
    {
      if ((configure_curl_easy_operation(ctxts_handlers->curl.curl_easy_handler)) != 0)
    {
      fprintf(stderr, "Stop running after an error occured before making a curl operation\n");
      g_running = 0;
      continue;
    }
      if (curl_easy_perform(ctxts_handlers->curl.curl_easy_handler) != CURLE_OK)
    fprintf(stderr, "an error occured\n");
    }
  return (0);
}
  1. I don't use threads (so far)
  2. I use the easy API (so far)
  3. I've changed some configuration settings (but they didn't improve performance):
access-log-path : "/dev/null" 
pprof-enabled : false 
unix-socket-enabled : false 
[ifql] enabled : false 
[subscriber] enabled : false

Do you have some ideas to improve performance?

EDIT: As you can see, the first screenshot is not the one corresponding to the C code above. Here is the correct one:


Solution

  • Try posting data in batches of 1000-10000 points per post. Batch size has to be large enough to become noticable. You'll have to experiment to find optimum.

    And it is better to put explicit and different timestamps for each line, otherwise influxdb will treat all lines as having the same timestamp. In your case multiple points having identical timestamp will be considered as ONE datapoint actually and overwrite each other - only one point will be kept in db.