How to implement communication between a client/server object with another of the same type using thrift

I'm working on a simple distributed system where there are:

1) a "management node" that works as central server

2) one or more "arithmetic nodes" that connect with the management node, store their list of services into it, can ask for any service and, in case, dispatch another node's request.

In order to do this, I've created two services in thrift, one for the management node and one for the arithmetic node, compiled in java and written their respective handler classes and interfaces.

Arithmetic node's service contains the method used for dispatching requests from another node.

Now there is my problem.

I'm failing to understand how thrift works when I have a single object that must work both as client and as server.

I'm in this situation:

I have two arithmetic nodes registered to management node (two Tsockets opened, two Ttransport opened, two management clients communicating to the same management server), then one arithmetic node must call for an operation and the other can respond to the request.

At this moment, what should happen exactly? I fail to understand from here. A connection between the two nodes must be established, but has it to be a direct connection between them? Does it mean I have to istantiate an "arithmetic server" and an "arithmetic client" ?

Solution

I'm failing to understand how thrift works when I have a single object that must work both as client and as server.

Thrift itself is just an RPC mechanism. You do a (remote) call, passing in some args, and get a result back (which also can be void or an exception raised). That said, a server can of course call out to another server while in a server handler routine. The code is nothing different compared to a simple client.

The problem you face is obviously not so much related to Thrift, but to the design of distributed systems in general, which is a broad topic on its own. I can give you some general outlines, but you will have to look up, read about and try things on your own to get a full understanding of the matter.

Direct call

A connection between the two nodes must be established, but has it to be a direct connection between them? Does it mean I have to istantiate an "arithmetic server" and an "arithmetic client" ?

Aside from the question why the one Arithmeter has to call another Arithmeter to solve a task that it could do on its own: Yes, that would be the simpliest way to do it:

+------------------+                       +-------------------+
|  ArithClient     +------ Calculate() --->+  ArithServer      |
+------------------+                       +-------------------+

In this simple scenario the left-hand node implements the client, the right-hand node implements the server.

"Relayed" call

But since you also wrote

I'm working on a simple distributed system where there are:

a "management node" that works as central server

one or more "arithmetic nodes" that connect with the management node, store their list of services into it, can ask for any service and, in case, dispatch another node's request

you probably want to manage a situation where the Arithmetical nodes don't know each other. It could work like this:

+------------------+                  +-------------------+
|  ArithClient     |                  |  ArithServer      |
+------+-----------+                  +------------+------+
       |                                           ^
    Calculate()                                Calculate()
       |          +-----------------------+        |
       +--------->+    ManagementNode     +--------+
                  +-----------------------+

So we have three nodes, one acting as a client, the third acting as a server, and the middle one acting as a server to the first, and additionally calling out to the third node, thus also acting as a client on that side.

For practical reasons, while this may work, it puts a lot of burden onto the central management node(s), making it/them the bottleneck of the whole construct. And if all these calls are synchronous, this makes things even worser.

Service repository

A better approach could be to do it in a slightly different way:

   +------------------+                          +-------------------+
   |  ArithClient     +---- (2) Calculate() ---->+  ArithServer      |
   +------+-----------+                          +-------------------+
          |
(1) please tell me where the
    next free ArithServer is?
          |
          |          +-----------------------+
          +--------->+    ManagementNode     |
                     +-----------------------+

Now we (1) ask the management node only for the information about how to contact a suitable server. Using this information, we do the call (2) directly, not involving the management node any further.

To optimize things even more, the client could store that information for a while and call the ArithServer as long and as often as needed. This way the management node needs to be called again when the server becomes unavailable, the client is restarted, or the time frame is over.

Further approaches

Another approach would involve real asynchronous messaging, like a message bus or MQ system. But that definitely is beyond the scope of this question.