What exact role do parameter servers and workers have the during distributed training of neural networks? (e.g. in Distributed TensorFlow)
Perhaps breaking it down as follows:
For example:
Parameter Servers — This is actually same as a worker
. Typically it’s a CPU
where you store the variables
you need in the workers
. In my case this is where I defined the weights variables
needed for my networks
Workers — This is where we do most of our computation intensive work.
In the forward pass — We take variables from Parameter servers
, do something with them on our workers
In the backward pass — We send the current state back to the parameter servers
which do some update operation and give us the new weights to try out
Are parameter servers only responsible for storing and providing variable values in an ACID store? ==> Yes, as per Tensorflow Documentation and Medium Article.
Do different parameter servers manage different variables in the graph? ==> Yes, inferred from the statement,
In addition, to that you can decide to have more than one parameter server for efficiency reasons. Using parameters the server can provide better network utilization, and it allows to scale models to more parallel machines. It is possible to allocate more than one parameter server.
from this link.
Do parameter servers
receive gradients themselves (and thus adding them up)? ==> No. AFAIK, it receives the Updated Weights
because computation of Gradients
and modifying the Weights
using the Formula,
W1 = W0 - Learning Rate * Gradients
happens in the Workers
.