Search code examples
javascriptnode.jsexpresssocketsreal-time

Stuck on approach for real-time data with custom filters


I've been scratching my head and trying this for about a week now. So I hope I can find my help here..

I'm making an application that provides real-time data to the client, I've thought about Server-Sent-Events but that doesn't allow per-user responses AFAIK.

WebSocket is also an option but I'm not convinced about it, let me sketch my scenario which I did with WS:

  1. Server fetches 20 records every second, and pushes these to an array
  2. This array gets sent to all websocket connections every second, see this pseudo below:
let items = [ { ... some-data ... } ];

io.on("connection", socket => {
  setInterval(() => {
    io.emit("all_items", items);
  }, 1000);
});
  1. The user can select some items in the front end, the websocket receives this per connection

However, I'm conviced the way I'm taking this on is not a good way and enormously innefficient. Let me sketch the scenario of the program of what I want to achieve:

  • There is a database with let's say 1.000 records
  • User connects to the back-end from a (React) Front-end, gets connected to the main "stream" with about 20 fetched records (without filters), which the server fetches every second. SELECT * FROM Items LIMIT 20

Here comes the complex part:

  • The user clicks some checkboxes with custom filters (in the front-end) e.g. location = Shelf 2. Now, what's supposed to happen is that the websocket ALWAYS shows 20 records for that user, no matter what the filters are

I've imagined to have a custom query for each user with custom options, but I think that's bad and will absolutely destroy the server if you have like 10.000 users

How would I be able to take this on? Please, everything helps a little, thank you in advance.


Solution

  • I have to do some guessing about your app. Let me try to spell it out while talking just about the server's functionality, without mentioning MySQL or any other database.

    I guess your server maintains about 1k datapoints with volatile values. (It may use a DBMS to maintain those values, but let's ignore that mechanism for the moment.) I guess some process within your application changes those values based on some kind of external stimulus.

    Your clients, upon first connecting to your server, start receiving a subset of twenty of those values once a second. You did not specify how to choose that initial subset. All newly-connected clients get the same twenty values.

    Clients may, while connected, apply a filter. When they do that, they start getting a different, filtered, subset from among all the values you have. They still get twenty values. Some or all the values may still be in the initial set, and some may not be.

    I guess the clients get updated values each second for the same twenty datapoints.

    You envision running the application at scale, with many connected clients.

    Here are some thoughts on system design.

    1. Keep your datapoints in RAM in a suitable data structure.
    2. Write js code to apply the client-specified filters to that data structure. If that code is efficient you can handle millions of data points this way.
    3. Back up that RAM data structure to a DBMS of your choice; MySQL is fine.
    4. When your server first launches load the data structure from the database.

    To get to the scale you mention you'll need to load-balance all this across at least five servers. You didn't mention the process for updating your datapoints, but it will have to fan out to multiple servers, somehow. You need to keep that in mind. It's impossible to advise you about that with the information you gave us.

    But, YAGNI. Get things working, then figure out how to scale them up. (It's REALLY hard work to get to 10K users; spend your time making your app excellent for your first 10, then 100 users, then scale it up.)

    Your server's interaction with clients goes like this (ignoring authentication, etc).

    1. A client connects, implicitly requesting the "no-filtering" filter.
    2. The client gets twenty values pushed once each second.
    3. A client may implicitly request a different filter at any time.
    4. Then the client continues to get twenty values, chosen by the selected filter.

    So, most client communication is pushed out, with an occasional incoming filter request.

    This lots-of-downbound-traffic little-bit-of-upbound-traffic is an ideal scenario for Server Sent Events. Websockets or socket.io are also fine. You could structure it like this.

    1. New clients connect to the SSE endpoint at https://example.com/stream

    2. When applying a filter they reconnect to another SSE endpoint at https://example.com/stream?filter1=a&filter2=b&filter3=b

    3. The server sends data each second to each open SSE connection applying the filter. (Streams work very well for this in nodejs; take a look at the server side code for the signalhub package for an example.