Search code examples
javascriptc++jsonwebsocketlighttpd

Separate messages buffered together with Lighttpd’s mod_wstunnel and UNIX socket communication to the backend


I have a problem with lighttpd's mod_wstunnel where if I send from the backend (using a UNIX socket to communicate between lighttpd and the backend) two messages in a row (two consecutive send/write calls) I see in lighttpd's logs that it receives, in fact, one single message combining the two.

What I observed is that if I have a delay, even a small one of 1ms, between the two writes to the UNIX socket, lighttpd will receive two separate messages and will send them to the final client as such.

I created a small proof of concept:

The lighttpd conf file (I call lighttpd -D -f lighttpd-poc.conf):

server.document-root = var.CWD 
server.bind = "0.0.0.0" 
server.port = 8042 

server.username = "www" 
server.groupname = "www" 

mimetype.assign = ( 
  ".html" => "text/html", 
  ".txt" => "text/plain", 
  ".jpg" => "image/jpeg", 
  ".png" => "image/png", 
  ".js" => "text/javascript", 
  ".css" => "text/css", 
  ".json" => "application/json", 
) 

static-file.exclude-extensions = ( ".fcgi", ".php", ".rb", "~", ".inc" ) 
index-file.names = ( "index.html" ) 

server.modules += ("mod_wstunnel") 

# requests coming to this URL need text data 
$HTTP["url"] =~ "^/ws" { 
  wstunnel.server = ( 
    "/ws/" => ( 
      ( 
        "socket" => "/tmp/ws.socket", 
      ) 
    ) 
  ) 

  server.stream-response-body = 2 
  wstunnel.ping-interval = 30 # keep the connection alive as long as the client is connected regardless of how often the client interacts with the server 
  wstunnel.frame-type = "text" 

  wstunnel.debug = 5 
} 

# try to avoid any buffering from Lighttpd - this does not work as expected: if the backend sends a 12MB message, Lighttpd will chunk it 
server.stream-response-body = 2 
server.stream-request-body = 2 
# server.chunkqueue-chunk-sz = 1 

The web page being served and where the problem can easily be seen:

<!DOCTYPE html> 
<script> 
    var ws = new WebSocket('ws://' + location.host + '/ws/'); 
    ws.onopen = function () { ws.send(JSON.stringify({ "foo": "bar" })); }; 
    ws.onmessage = function (event) { 
        try{ 
            console.log(JSON.parse(event.data)); 
        } 
        catch(SyntaxError) 
        { 
            console.log(`Parse problem: ${event.data}`); 
        } 
    }; 
</script> 

The backend (minimal reproducible example) that actually handles the websocket requests (compile & run with g++ poc.cpp && ./a.out):

#include <cstdio> 
#include <unistd.h> 
#include <sys/socket.h> 
#include <sys/un.h> 
#include <cstdlib> 

#include <iostream> 
#include <thread> 
#include <chrono> 

using namespace std::chrono_literals; 

int main() 
{ 
    std::cout << "Started...\n"; 

    struct sockaddr_un address; 
    int server_fd, client_sock; 

    if ((server_fd = socket(AF_UNIX, SOCK_STREAM, 0)) == -1) { 
        perror("socket init failure"); 
        exit(EXIT_FAILURE); 
    } 

    unlink("/tmp/ws.socket"); 

    memset(&address, 0, sizeof(address)); 
    address.sun_family = AF_UNIX; 
    strncpy(address.sun_path, "/tmp/ws.socket", sizeof(address.sun_path) - 1); 

    if (bind(server_fd, (struct sockaddr*)&address, sizeof(address)) < 0) { 
        perror("bind failure"); 
        exit(EXIT_FAILURE); 
    } 

    if (listen(server_fd, 42) < 0) { 
        perror("listen failure"); 
        exit(EXIT_FAILURE); 
    } 

    while (true) { 
        int addrlen = sizeof(address); 
        if ((client_sock = accept(server_fd, (struct sockaddr*)&address, (socklen_t*)&addrlen)) < 0) { 
            perror("accept failure"); 
            exit(EXIT_FAILURE); 
        } 

        char buf[] = "{\"foo\": \"BAR\", \"I\": \"\", \"R\": { \"X\": { \"Z\": 1.2699546813964844, \"T\": 2 } }}"; 
        char buf2[] = "{ \"foo\": \"BAR\", \"I\": \"\", \"R\": { \"Y\": { \"C\": 421035002, \"M\": 2147483647 }}}"; 

        int rc; 
        char buffer[1024] = {0}; 

        if ((rc = read(client_sock, buffer, 1024)) > 0) { 
            int num_sent = write(client_sock, buf, strlen(buf)); 

            // std::this_thread::sleep_for(1ms); // with this, it works!!! 
            num_sent = write(client_sock, buf2, strlen(buf2)); 
        } 
    } 

    return 0; 
} 

Please note the commented std::this_thread::sleep_for(1ms) line towards the end of the code. With the line commented: the problem is reproduced (two messages are bundled into one). With the line uncommented: the problem goes away (two messages are sent separately).

If you navigate to http://localhost:8042/poc.html, you can see that the JSON.parse call fails (if the above-mentioned line is commented) since two JSON messages are actually received by the browser as one.

With a binary protocol, I could design it in such a way that I know where a frame starts and (implicitly) ends, then parse the whole message accordingly, but with JSON data and this setup where the backend should send JSON objects and the browser should receive and parse them, I don't know what to do and why is the (presumably) kernel buffering the two messages. I presume the kernel is buffering the messages, since if I look on lighttpd's logs I can see that it either receives one big message (when the sleep call is commented) or two smaller messages when I enable the sleep call. As such I conclude that it's not lighttpd doing the buffering, nor my code.

While I could use something like https://www.npmjs.com/package/json-multi-parse or leave the sleep in the code, I'd like to get to the root of the problem since the sleep solution seems flaky and the multi-parse solution seems un-natural. Moreover I'd like to understand why and where the buffering takes place.

If it matters, lighttpd’s version is 1.4.55.


Solution

  • As linked in a comment above, the answer in lighttpd/mod_wstunnel concatenates JSON messages links to RFC 6455 which says that "An intermediary might coalesce and/or split frames."

    lighttpd mod_wstunnel is a websocket tunnel. mod_wstunnel takes the data from the backend and creates websocket frames, without any further parsing of the backend data. The backend could tunnel data in any format it likes, with JSON being one option. mod_wstunnel is not aware of your JSON message boundaries.

    As an alternative, you could use lighttpd mod_proxy with lighttpd.conf proxy.header += ("upgrade" => "enable") and then your proxy backend could implement the websocket protocol and be able to control the websocket framing.