I have a TensorFlow model training on an Ubuntu 16.04 virtual machine on Azure. Suddenly, the TensorBoard process is not reachable anymore from outside. The Network Security Group should be configured properly (see picture) and, as I said, it used to work up to this evening. I didn't change anything on the machine. Any check that I can make? Any hint? Thanks!
You should use netstat -ant|grep 6006
(TensorFlow is listening on 6006 by default). You should get the following result.
shui@shui:~$ netstat -ant|grep 6006
tcp 0 0 0.0.0.0:6006 0.0.0.0:* LISTEN
According to your description, I think the port is not in listening. When you start a tensorflow service, if you only use tensorboard --logdir=run1:/tmp/tensorflow/
. When the ssh session is expired or closed, the service will be stop, you could not connect tensoforflow service. You could use the following command to start the service. Even you ssh session is expired or closed, you also could access your service.
nohup tensorboard --logdir=run1:/tmp/tensorflow/ &
man nohup
nohup - run a command immune to hangups, with output to a non-tty
& to the command line to run in the background:
In alternative to nohup
, you could achieve a similar result running the tensorboard within a screen session:
:~$ screen -S tensorboard-screen
:~$ tensorboard --logdir=run1:/tmp/tensorflow/
then type Ctrl + a, d
to detach the screen and go back to the main shell. When you exit the ssh session, the screen will be running on. Once logged back in, just type screen -r tensroboard-screen
to resume the screen session.