I am working on creating a distributed database system using Akka. I have the code for a database system that works on a single machine but want to make it distributed. But I have difficulty in understanding and approach to creating. Here's what my thoughts of approaching to create a simple dist-DB system and I have few questions at the end.
I create a master node and 2 worker nodes (say for now). After creating a relation (Table) in DB, I send a message to master to distribute this relation which should divide the relation into 2 (no. of workers) chunks and create the sub-relations in the worker nodes.
def receive: Receive = {
case distributeTable(r: Relation) => {
worker1 ! createNewRelation(r1)
worker2 ! createNewRelation(r1)
for (i <- 0 until r.rows/2) {
worker1 ! add(r1, r(i));
}
for (i <- r.rows/2 until r.rows) {
worker2 ! add(r1, r(i));
}
}
}
Now for any query on DB given from user object, will be sent to master and it further sends to worker nodes and is performed on the sub-relations(smaller tables) on these nodes and send the results to master. Depending on the query, master performs any additional task on these results and send the final result to the user object.
Am I thinking correct on how a distributed DB should work?
Is the implementation of distributeTable
correct?
Is there a way to group all the worker nodes and send the messages so they are distributed among them? I want to avoid writing tasks for individual worker node. Eg: Instead of worker1 ! msg1
, worker2 ! msg1
. Is there a way where I do workers ! msg1
and it will be sent to all the worker nodes.
If so how to collect the return values from all the workers at the master in this case.
Sounds like you want a Router
. You can use one actor and duplicate essentially either all in one ActorSystem
or distributed across different ActorSystem
across a network where each actor system can have a locality to one shard of the database.