How to run TensorFlow on multiple nodes with several CPUs each

I want to run linear regression with TensorFlow on very large datasets. I have a cluster with 9 nodes and 36 CPUs each. What is the best way to distribute the computations across all the resources available?

According to this course https://www.coursera.org/learn/intro-tensorflow, the best way to use TensorFlow on distributed setting is to use Estimators. So I wrote my code as suggested there and followed the instructions at https://www.tensorflow.org/deploy/distributed for the parallelisation. I then tried to run my script my_code.py (on a "small" dataset with 120 million data points and 2 feature columns to test the code) on nodes 2 and 3 as follows:

python my_code.py \ 
--ps_hosts=node1:2222 \
--worker_hosts=node2:2222,node3:2222
--job_name=worker
--task_index="i-2"

where i is the number of the node (either 2 or 3); whereas on node 1 I do the same but with --job_name=ps and --task_index=0. However this way it seems that only one CPU per node is used. Do I need to specify each CPU individually?

Thank you in advance.

Solution

As far as I understand, the best thing to do is to use all the CPUs on the same node together as a single worker, in order to make the most of the shared memory. So for example in the case above, one would have to specify manually only 9 workers and make sure that each of them corresponds to one node where all the 36 CPUs are used. The commands to do this depend on the specific cluster used.