Search code examples
rethinkdbrethinkdb-python

RethinkDB (Python) Change Feed - How to avoid blocking?


New to RethinkDB and want to make sure I'm getting this right.

Is a change feed in RethinkDb always blocking?

The following example is given in the docs (https://rethinkdb.com/docs/changefeeds/python/)

feed = r.table('users').changes().run(conn)
for change in feed:
    print change)

running this in the main thread will block the thread forever. So basically I now have it running in a separate thread with a sleep timer.

This starts to feel a whole lot like polling, isn't the whole idea to not have to do that?

So here's the questions:

  • Is there a callback version of this I've missed?

  • Is running the change feed loops in threads what's suggested? Any problems with doing so?

  • Is it the same in node.js? (remember seeing a some callbacks in the node.js examples, but perhaps that was just the async .run call)

Haven't been able to find any real-world examples of this in use, the docs simply tells you to open a separate terminal window / python process and run it there.

Appreciate any help / clarification, thanks!


Solution

  • Is a change feed in RethinkDb always blocking?

    Yes, it must be a blocking queue in order to let your code accept each element from the changes stream that comes upon that changefeed (the documentation says: Unlike other cursors, the output of changes is infinite: the cursor will block until more elements are available.).

    running this in the main thread will block the thread forever.

    Not really: you can still control your thread a new value is obtained from the changefeed, and you can do something else than just printing a change element or just break the for statement. But yes, it's blocked until the next changefeed value is read from a RethinkDB connection.

    Is there a callback version of this I've missed?

    No, but you can easily implement callback-oriented code around the r.changes() method if you really need it.

    Is running the change feed loops in threads what's suggested? Any problems with doing so?

    It depends on how your particular application is designed. You might have a single-threaded application that listens to an infinite changefeed and does something other than just printing the new change value. If your application should do more than just listening to the changefeed, then yes, you have to a create a new thread and iterate through the changefeed stream on that thread.

    Is it the same in node.js? (remember seeing a some callbacks in the node.js examples, but perhaps that was just the async .run call)

    Yes, this is just because of node.js encourages applications to be completely asynchronous. Once you read the changefeed cursor with cursor.each(console.log);, it will run infinitely just like the Python version does (however, I don't really remember how to break the each method). Java, unlike JavaScript but like Python, also allows to iterate over each element in the cursor with the changes and blocks until a new change is accepted.

    Haven't been able to find any real-world examples of this in use, the docs simply tells you to open a separate terminal window / python process and run it there.

    Well, this is the easiest example to demonstrate how it works: you listen to the changes upon a certain changefeed (let it just be a simple CLI application working in a terminal) and just do whatever you want with the change while you change the database from elsewhere (it may be a built-in web interface, recli, your RethinkDB-based application, etc).

    I can share a simple real-life example from my previous experience in Java (+ Spring Framework) where I used RethinkDB for the first time: imagine that you have a document conversion REST service that just accepts certain documents and converts them to images, but you also want to monitor the conversion status in real-time right in your browser. How it was implemented:

    • The REST service can accept multiple connections to process multiple conversion requests in parallel (it must be multi-threaded, of course). These requests convert uploaded documents and save their conversion statuses to a specific table in the RethinkDB database.
    • Also, this REST service listens to the changes that table in the RethinkDB database using the r.changes() method in a separate "forever" thread to read the statuses from the table infinitely and expose the statuses via a web socket to the outer world in order to let you monitor them right in your web browser without any kind of polling. You don't even need this thread to be terminated, because it's "forever" by design.

    Another good real-time examples that come to my mind are chats (instant messaging), document sharing (watching real-time folder changes), real-time multi-user document collaboration, etc, and anything you might need to build with real-time in mind.