Search code examples
c++performancecurlnetwork-programmingcpp-netlib

Where is a delay in an HTTP POST coming from?


I am developing a web service on Ubuntu 14.04 in C++ using cpp-netlib in asynchronous mode. The service needs to respond to data sent over HTTP POST in the message body. I am observing poor performance when receiving input larger than 1K, and I want to fix that.

If the data is relatively small, less than 1K, the server receives the data almost instantly. If the data is more than 1K, there is a roughly one second delay after the asynchronous read callback function is first invoked before the first chunk of actual data is presented to the callback. After the initial delay, the subsequent chunks arrive almost instantly.

How can I eliminate this delay? Is it an artifact of using curl to post test data? How can I easily test the performance of curl for posting data?

You can find minimal source code demonstrating the problem on github. Here's the command I use to post data to the server:

rcook$ curl -d @AsyncDaemon.h http://localhost:8787/foo

Here's an example of its output (annotated):

rcook$ ./async_daemon 
1431387368.321863: AsyncDaemon constructor
1431387368.322446: receive thread beginning

*** It's waiting for a connection here.

1431387371.536191: begin transaction 0 on thread 24050
1431387371.536237: transaction 0 constructor
1431387371.536273: received 1206 byte request for /foo from 127.0.0.1:49402
1431387371.536312: invoked asynchronous read
1431387371.536321: end transaction handler
1431387371.536335: begin asynchronous read callback on thread 24050
1431387371.536348: read 0 bytes
1431387371.536386: invoked asynchronous read
1431387371.536394: end asynchronous read callback

*** The asynchronous read callback is invoked quickly, but gets no data.
*** There is then a pause of just over one second before the asynchronous
*** read callback is invoked again.

1431387372.537203: begin asynchronous read callback on thread 24050
1431387372.537253: read 1024 bytes
1431387372.537307: invoked asynchronous read
1431387372.537317: end asynchronous read callback

*** There is no significant delay when reading the next chunk.

1431387372.537429: begin asynchronous read callback on thread 24050
1431387372.537469: read 182 bytes
1431387372.537478: finished reading the body
1431387372.537746: wrote response
1431387372.537763: transaction 0 destructor
1431387372.537772: end asynchronous read callback

*** The server is then killed with a keyboard interrupt.

^C1431387375.382186: terminating with signal 2
1431387375.382231: initiating shutdown
1431387375.382241: stopping server
1431387375.382363: server run finished
1431387375.382423: receive thread ending
1431387375.382522: AsyncDaemon destructor

As you can see, after the asynchronous read callback is invoked for the first time (and receives zero bytes of data, BTW), it requests another chunk of input. At this point there is a pause of just over one second before that input arrives, from 1431387371.536394 to 1431387372.537203 in this example. What's going on during that time? How can I eliminate that delay?

I have done some research online, and run several experiments (synchronous vs. asynchronous mode with cpp-netlib (no effect), curl vs. libcurl (no effect)), but have not been able to find an answer.

Update: TCP Dump

Upon jxh's suggestion, I ran a tcp dump during a sample transaction:

00:28:01.304446 IP6 localhost.52265 > localhost.8787: Flags [S], seq 3956487146, win 43690, options [mss 65476,sackOK,TS val 395479802 ecr 0,nop,wscale 7], length 0
00:28:01.304461 IP6 localhost.8787 > localhost.52265: Flags [R.], seq 0, ack 3956487147, win 0, length 0
00:28:01.305014 IP localhost.49421 > localhost.8787: Flags [S], seq 1668603425, win 43690, options [mss 65495,sackOK,TS val 395479803 ecr 0,nop,wscale 7], length 0
00:28:01.305039 IP localhost.8787 > localhost.49421: Flags [S.], seq 4010788604, ack 1668603426, win 43690, options [mss 65495,sackOK,TS val 395479803 ecr 395479803,nop,wscale 7], length 0
00:28:01.305079 IP localhost.49421 > localhost.8787: Flags [.], ack 1, win 342, options [nop,nop,TS val 395479803 ecr 395479803], length 0
00:28:01.305185 IP localhost.49421 > localhost.8787: Flags [P.], seq 1:176, ack 1, win 342, options [nop,nop,TS val 395479803 ecr 395479803], length 175
00:28:01.305210 IP localhost.8787 > localhost.49421: Flags [.], ack 176, win 350, options [nop,nop,TS val 395479803 ecr 395479803], length 0
00:28:02.306555 IP localhost.49421 > localhost.8787: Flags [P.], seq 176:1382, ack 1, win 342, options [nop,nop,TS val 395480053 ecr 395479803], length 1206
00:28:02.306620 IP localhost.8787 > localhost.49421: Flags [.], ack 1382, win 1373, options [nop,nop,TS val 395480053 ecr 395480053], length 0
00:28:02.307223 IP localhost.8787 > localhost.49421: Flags [P.], seq 1:52, ack 1382, win 1373, options [nop,nop,TS val 395480053 ecr 395480053], length 51
00:28:02.307270 IP localhost.49421 > localhost.8787: Flags [.], ack 52, win 342, options [nop,nop,TS val 395480053 ecr 395480053], length 0
00:28:02.307494 IP localhost.8787 > localhost.49421: Flags [P.], seq 52:66, ack 1382, win 1373, options [nop,nop,TS val 395480053 ecr 395480053], length 14
00:28:02.307522 IP localhost.49421 > localhost.8787: Flags [.], ack 66, win 342, options [nop,nop,TS val 395480053 ecr 395480053], length 0
00:28:02.307765 IP localhost.8787 > localhost.49421: Flags [F.], seq 66, ack 1382, win 1373, options [nop,nop,TS val 395480053 ecr 395480053], length 0
00:28:02.307867 IP localhost.49421 > localhost.8787: Flags [F.], seq 1382, ack 67, win 342, options [nop,nop,TS val 395480053 ecr 395480053], length 0
00:28:02.307917 IP localhost.8787 > localhost.49421: Flags [.], ack 1383, win 1373, options [nop,nop,TS val 395480053 ecr 395480053], length 0

I'm not very experienced with tcpdump, but it looks like 175 bytes flow to the server (HTTP headers?), then after a delay of a little over one second, 1206 bytes flow to the server, followed by a 51 byte chunk with minimal delay, followed by the server response.

This tells me that the delay is being introduced on the client side, likely in curl. Does anyone know why?


Solution

  • Problem solved, thanks to debugging and diagnostic techniques suggested by @jxh.

    Adding --trace - --trace-time to the curl command revealed that curl was spending that mysterious second waiting for the server to return a 100 Continue response before it sent the rest of the request:

    01:31:44.043611 == Info: Connected to localhost (127.0.0.1) port 8787 (#0)
    01:31:44.043726 => Send header, 175 bytes (0xaf)
    0000: 50 4f 53 54 20 2f 66 6f 6f 20 48 54 54 50 2f 31 POST /foo HTTP/1
    0010: 2e 31 0d 0a 55 73 65 72 2d 41 67 65 6e 74 3a 20 .1..User-Agent: 
    0020: 63 75 72 6c 2f 37 2e 33 35 2e 30 0d 0a 48 6f 73 curl/7.35.0..Hos
    0030: 74 3a 20 6c 6f 63 61 6c 68 6f 73 74 3a 38 37 38 t: localhost:878
    0040: 37 0d 0a 41 63 63 65 70 74 3a 20 2a 2f 2a 0d 0a 7..Accept: */*..
    0050: 43 6f 6e 74 65 6e 74 2d 4c 65 6e 67 74 68 3a 20 Content-Length: 
    0060: 31 32 30 36 0d 0a 43 6f 6e 74 65 6e 74 2d 54 79 1206..Content-Ty
    0070: 70 65 3a 20 61 70 70 6c 69 63 61 74 69 6f 6e 2f pe: application/
    0080: 78 2d 77 77 77 2d 66 6f 72 6d 2d 75 72 6c 65 6e x-www-form-urlen
    0090: 63 6f 64 65 64 0d 0a 45 78 70 65 63 74 3a 20 31 coded..Expect: 1
    00a0: 30 30 2d 63 6f 6e 74 69 6e 75 65 0d 0a 0d 0a    00-continue....
    01:31:45.045626 == Info: Done waiting for 100-continue
    01:31:45.045831 => Send data, 1206 bytes (0x4b6)
    

    It is a known deficiency in cpp-netlib (at least as of version 0.11.0) that it does not support sending the 100 Continue response that curl is expecting.

    The solution then became convincing curl to not wait for a 100 Continue response. As I discovered here, adding -H 'Expect:' to the curl command line does the trick. With that, the entire transaction takes about a millisecond.

    Since I answered my own question, I won't accept my answer for a few weeks to give others a chance to contribute something better.