python tensorflow distributed-computing grpc

Run distributed TensorFlow, UnavailableError: Endpoint read fail

I am new to TensorFlow and do not have much experience. I am now trying the distributed TensorFlow.

Following the official guide, I first create two servers. I run the following code in two seperate terminals

import sys
import tensorflow as tf

task_number = int(sys.argv[1])

cluster = tf.train.ClusterSpec({"local": ["localhost:2222", "localhost:2223"]})
server = tf.train.Server(cluster, job_name="local", task_index=task_number)

print("Starting server #{}".format(task_number))

server.start()
server.join()

The server has been set up successfully

2018-01-25 20:05:37.651802: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job local -> {0 -> localhost:2222, 1 -> localhost:2223}
2018-01-25 20:05:37.652881: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:324] Started server with target: grpc://localhost:2222
Starting server #0
2018-01-25 20:05:37.652938: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:328] Server already started (target: grpc://localhost:2222)

Then I run the following program

import tensorflow as tf
x = tf.constant(2)

with tf.device("/job:local/task:1"):
    y2 = x - 66

with tf.device("/job:local/task:0"):
    y1 = x + 300
    y = y1 + y2

with tf.Session("grpc://localhost:2223") as sess:
    result = sess.run(y)
    print(result)

Then it gives me the following error message

E0125 20:05:49.573488650   10292 ev_epoll1_linux.c:1051]     grpc epoll fd: 5
Traceback (most recent call last):
  File "/home/****/miniconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1323, in _do_call
    return fn(*args)
  File "/home/****/miniconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1293, in _run_fn
    self._extend_graph()
  File "/home/****/miniconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1354, in _extend_graph
    self._session, graph_def.SerializeToString(), status)
  File "/home/****/miniconda3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.UnavailableError: Endpoint read failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/****/Documents/intern/sample_data/try.py", line 25, in <module>
    result = sess.run(y)
  File "/home/****/miniconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 889, in run
    run_metadata_ptr)
  File "/home/****/miniconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1120, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/****/miniconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run
    options, run_metadata)
  File "/home/****/miniconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnavailableError: Endpoint read failed

I googled it and some suggest that it might be the problems with proxy, so I have disabled the proxy but nothing changed.

Does anyone have any ideas what the problems might be? Many thanks in advance.

Solution

Never mind, problems solved. It is the setting about the proxy. We need to unset proxy on both servers and clients to make the program work.