This is the Beholder Plugin, it allows for visualisation of all trainable variables (with sensible restrictions for massively deep networks).
My problem is that I am running my training using the tf.estimator.Estimator
class and it appears that the Beholder plugin does not play nicely with the Estimator
API.
My code looks like this:
# tf.data input pipeline setup
def dataset_input_fn(train=True):
filenames = ... # training files
if not train:
filenames = ... # test files
dataset = tf.data.TFRecordDataset(filenames), "GZIP")
# ... and so on until ...
iterator = batched_dataset.make_one_shot_iterator()
return iterator.get_next()
def train_input_fn():
return dataset_input_fn(train=True)
def test_input_fn():
return dataset_input_fn(train=False)
# model function
def cnn(features, labels, mode, params):
# build model
# Provide an estimator spec for `ModeKeys.PREDICT`.
if mode == tf.estimator.ModeKeys.PREDICT:
return tf.estimator.EstimatorSpec(
mode=mode,
predictions={"sentiment": y_pred_cls})
eval_metric_ops = {
"accuracy": accuracy_op,
"precision": precision_op,
"recall": recall_op
}
normal_summary_hook = tf.train.SummarySaverHook(
100,
summary_op=summary_op)
return tf.estimator.EstimatorSpec(
mode=mode,
loss=cost_op,
train_op=train_op,
eval_metric_ops=eval_metric_ops,
training_hooks=[normal_summary_hook]
)
classifier = tf.estimator.Estimator(model_fn=cnn,
params=...,
model_dir=...)
classifier.train(input_fn=train_input_fn, steps=1000)
ev = classifier.evaluate(input_fn=test_input_fn, steps=1000)
tf.logging.info("Loss: {}".format(ev["loss"]))
tf.logging.info("Precision: {}".format(ev["precision"]))
tf.logging.info("Recall: {}".format(ev["recall"]))
tf.logging.info("Accuracy: {}".format(ev["accuracy"]))
I can't figure out where to add the beholder hook in this setup.
If I add it in the cnn
function as a training hook:
return tf.estimator.EstimatorSpec(
mode=mode,
loss=dnn.cost,
train_op=dnn.train_op,
eval_metric_ops=eval_metric_ops,
training_hooks=[normal_summary_hook, beholder_hook]
)
then I get an InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder' with dtype uint8 and shape [?,?,?]
.
If I try to use a tf.train.MonitoredTrainingSession
to setup the classifier
then the training proceeds as normal but nothing is logged to the beholder plugin. Looking at stdout I see two sessions being created one after the other, so it would appear that when you create a tf.estimator.Estimator
classifier it spins up its own session after terminating any existing sessions.
Does anyone have any ideas?
Edited post:
This is a problem with old tensorflow versions. Fortunately, the issue is fixed in tensorflow version 1.9! The code below uses Beholder with tf.estimator.Estimator. It produced the same error as you mention with an older version, but everything works perfectly in version 1.9!
from capser_7_model_fn import *
from tensorflow.python import debug as tf_debug
from tensorflow.python.training import basic_session_run_hooks
from tensorboard.plugins.beholder import Beholder
from tensorboard.plugins.beholder import BeholderHook
import logging
# create estimator for model (the model is described in capser_7_model_fn)
capser = tf.estimator.Estimator(model_fn=model_fn, params={'model_batch_size': batch_size}, model_dir=LOGDIR)
# train model
logging.getLogger().setLevel(logging.INFO) # to show info about training progress in the terminal
beholder = Beholder(LOGDIR)
beholder_hook = BeholderHook(LOGDIR)
capser.train(input_fn=train_input_fn, steps=n_steps, hooks=[beholder_hook])
Another aspect is that I need to specify exactly the same LOGDIR for the summary writer, the tensorboard command line call and the BeholderHook. Before, in order to compare different runs of my model, I wrote summaries for different runs in LOGDIR/run_1, then LOGDIR/run_2, etc. i.e.:
capser = tf.estimator.Estimator(model_fn=model_fn, params={'model_batch_size': batch_size}, model_dir=LOGDIR/run_n)
and I used
tensorboard -logdir=LOGDIR
to launch tensorboard and I used
beholder_hook = BeholderHook(LOGDIR)
to write beholder data. In that case, beholder did not find the data it needed. What I needed to do was to specify exactly the same LOGDIR for everything. I.e., in the code:
capser = tf.estimator.Estimator(model_fn=model_fn, params={'model_batch_size': batch_size}, model_dir=LOGDIR+'/run_n')
beholder_hook = BeholderHook(LOGDIR+'/run_n')
And to launch tensorboard in the terminal:
tensorboard -logdir=LOGDIR+'/run_n'
Hope that helps.