Efficient ways to parse an HTTP request in C

I'm writing an HTTP server in C. Note that I DO KNOW how to parse a string in an HTTP request format. This means I can easily parse out the headers after I've received them.

My struggle is the following:

The HTTP protocol is built on top of TCP sockets. And thus the request sent by client is not guaranteed to be delivered in one piece, after just one read() operation. Thus, I need to read the request up to the end of headers, get Content-Length, and the proceed to read() the body knowing how much data I should read.

I'm using nonblocking IO in case that matters to some readers.

For this, I have two ideas, each of which has its serious drawbacks.

read() one byte at a time, checking if the end of the buffer is "\r\n\r\n" every time after each read(). Then get Content-Length and read the body. Highly inefficient due to the number of read() syscalls.
read into a buffer in larger chunks, checking every time if the end of a request was read using strstr() to find "\r\n\r\n" substring. When substring "\r\n\r\n" was found, save the number of chars read that follow it in a variable n, get Content-Length. Proceed to read Content-Length - n chars. Also inefficient due to having to call strstr() after every read().

Any suggestions as to how to do it more efficiently?

IMPORTANT! I understand that the second approach is better. I'm looking for some new suggestions that are better than mine.

Solution

Your question is about optimization

You ask about parsing HTTP headers, and you describe your method, which to sum up is:

You read in all the headers
You pass all the headers to a header parsing function

And then you explain your technique for finding the end of all the headers. And then you end with the question:

Any suggestions as to how to do it more efficiently?

With no specific restriction on how to do it more efficiently.

You need to perform profiling before optimizing

You need to measure the performance of your software and identify the hot spots in your code. E.g., code that takes longer than you expect.

For the sake of argument, let us assume you are efficiently reading in socket data in chunks to create a potential HTTP message header block, and header parsing turns out to be a hot spot in your software.

Potential problems with your approach

If your header parsing turns out to be a bottleneck, and you believe it is related to scanning for the end of the headers, then you need a hypothesis for why it is a bottleneck at all.

Since you don't post any code, we are forced to speculated based on your general description. However, here are some guesses:

Inefficient scanning with strstr()

This guess is based on the fact that you are treating your HTTP headers as C strings by using a string function. Therefore you have to nul terminate (add '\0' to the end of your header data) and then do a needle in the haystack search for "\r\n\r\n".

You are sort of required to start from the beginning, because HTTP pipelining might stuff more than one request into a single received buffer. The comparison test code has to scan for the nul terminator before it knows it is done, so that makes the loop condition a little more complicated.
Possible O(n²) behavior

Based on the assumption that profiling has shown the scanning to be a bottleneck, one cause may be that you are concatenating your newly received data against your previously received data so that you can do the strstr() call again to find the end of the headers, since you did not find it before.

If you are using strcat() of your old buffer converted into a string with your new buffer converted into a string, then this causes a another scan of your old data to find the end of the old buffer in order to perform the concatenation.

If you are starting your strstr() call from the beginning again after concatenation, this causes another scan of your old data to find the "\r\n\r\n" that you had already figured out was not there.

Resolving the hypothetical issues

Since we have no code, it is kind of silly to provide fixes to made up problems. However, for the sake of posterity, even if it doesn't apply to your problem, it is still helpful to give a complete answer.

You can just scan for '\n' instead and see if it is followed by an empty line.

This simplifies the bulk of the scanning work to be a simple character comparison, and will have nice linear behavior.
Do not use strcat() to concatenate.

Your header data should be copied into a buffer that is expected to hold all the headers most of the time (maybe something like 16KB). When concatenating, you should simply jump to the end of the what was previously scanned, and copy the newly read data from that point.
Do not scan for the end of headers from the beginning.

Instead, after concatenation is done, start your scan from the point you had left off, which would be from the beginning of the newly read/copied data.

Don't parse twice

If you have removed the above issues, then after profiling, you should see reasonable good performance from your header parsing.

However, it is still possible to be more efficient. Since the suggestion above is already going through the work of identifying the end of each header line, you could just build your header parser into that scanning loop.

After finding the \n, the previous character should be \r, so you know the length of the header line just scanned. Since you are just looking for \n now, you can use memchr() instead strstr(), which avoids the need to nul terminate your input.

If you stash the location of each end of line, you also have the beginning of the next header line.

You know you are done parsing headers when you reach an empty header line.

This allows you to parse the headers and find the end of the headers in a single scan of the input.

Don't copy data

Instead of performing data concatenation, you can just allocate a large buffer to represent the header block, and use that same large buffer for your recv() calls. This avoids the need to concatenate. Instead, you call recv() directly with the offset into the buffer starting at the end of the previous chunk read when you failed to find the end of the headers.

    offset = 0;
    bufsz = BUFSZ;
    while (NOT_END_OF_HEADERS) {
        if (bufsz > offset)
            n = recv(sock, buf + offset, bufsz - offset, 0);
        if (ERROR_OR_NEED_TO_STOP) HANDLE_IT;
        RESUME_PARSE(buf + offset, buf + offset + n);
        offset += n;
    }

Don't waste memory

Regular application programs normally don't worry too much about taking too much memory. They are relatively short lived programs, and so the memory used is relinquished back to the system relatively quickly.

However, long running programs on embedded system usually need to be much more miserly. So in addition to CPU profiling, the system will be memory profiled as well, and memory hogs will be scrutinized.

This is why embedded software will typically use buffer structures that link together smaller buffers to represent the streamed message rather than a contiguous large buffer. This is to allow better right-sizing of memory usage, and can avoid issues related to memory fragmentation.

The moral of the story

Optimizations should first be approached with measurement. When actually doing optimizations, the solutions will not always be straightforward. However, resolving a performance bottleneck can be very satisfying for a developer.