I have a cluster where one of the nodes is master. The synchronization/decision making who will become the new master is done throught the database(Postgres accessed via java/spring/jpa).
Here's what I currently have
@Repository
public interface ServiceRepository{
@Transactional
@Modifying
@Query("UPDATE ServiceInstance e SET e.master=true"
+ " where e.id=:serviceId AND NOT"
+ " EXISTS (SELECT master from ServiceInstance where master = true)")
int tryToBecomeMaster(@Param("serviceId") String serviceId);
There is one row of ServiceInstance in the DB for each node in the cluster. Each node is trying to update its own row but only if there are no rows already marked as 'master'. That query is ran on regular intervals by each node.
Is that query atomic or is there a race condition? If it is not atomic what transaction isolation level do I need here?
My main concern is that if 2 queries are allowed in parallel they may both calculate the SELECT clause as false(no masters) and update their rows.
To make sure that there can be only one master, create a unique partial index on the table:
CREATE UNIQUE INDEX ON serviceinstance ((1)) WHERE master;
That will give you an error as soon as you try to set master
to TRUE
for more than one row in the table, and the C of the ACID guarantees (consistency) will make sure that this always works. You have to catch and handle the exception in your function.
Let me remark that this is not a good way to implement a high availability cluster: it has a single point of failure in the shape of the database.