Search code examples
tensorflowpluginstensorboard

Using Beholder plugin with tf.estimator.Estimator


This is the Beholder Plugin, it allows for visualisation of all trainable variables (with sensible restrictions for massively deep networks).

My problem is that I am running my training using the tf.estimator.Estimator class and it appears that the Beholder plugin does not play nicely with the Estimator API.

My code looks like this:

# tf.data input pipeline setup
def dataset_input_fn(train=True):
  filenames = ... # training files
  if not train:
    filenames = ... # test files

  dataset = tf.data.TFRecordDataset(filenames), "GZIP")

  # ... and so on until ...
  iterator = batched_dataset.make_one_shot_iterator()
  return iterator.get_next()
  
def train_input_fn():
  return dataset_input_fn(train=True)

def test_input_fn():
  return dataset_input_fn(train=False)

# model function
def cnn(features, labels, mode, params):
  # build model

  # Provide an estimator spec for `ModeKeys.PREDICT`.
  if mode == tf.estimator.ModeKeys.PREDICT:
    return tf.estimator.EstimatorSpec(
      mode=mode,
      predictions={"sentiment": y_pred_cls})

  eval_metric_ops = {
    "accuracy": accuracy_op,
    "precision": precision_op,
    "recall": recall_op
  }

  normal_summary_hook = tf.train.SummarySaverHook(
    100,
    summary_op=summary_op)

  return tf.estimator.EstimatorSpec(
    mode=mode,
    loss=cost_op,
    train_op=train_op,
    eval_metric_ops=eval_metric_ops,
    training_hooks=[normal_summary_hook]
  )

classifier = tf.estimator.Estimator(model_fn=cnn,
                                    params=...,
                                    model_dir=...) 

classifier.train(input_fn=train_input_fn, steps=1000)
ev = classifier.evaluate(input_fn=test_input_fn, steps=1000)

tf.logging.info("Loss: {}".format(ev["loss"]))
tf.logging.info("Precision: {}".format(ev["precision"]))
tf.logging.info("Recall: {}".format(ev["recall"]))
tf.logging.info("Accuracy: {}".format(ev["accuracy"]))
  

I can't figure out where to add the beholder hook in this setup. If I add it in the cnn function as a training hook:

return tf.estimator.EstimatorSpec(
  mode=mode,
  loss=dnn.cost,
  train_op=dnn.train_op,
  eval_metric_ops=eval_metric_ops,
  training_hooks=[normal_summary_hook, beholder_hook]
)

then I get an InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder' with dtype uint8 and shape [?,?,?].

If I try to use a tf.train.MonitoredTrainingSession to setup the classifier then the training proceeds as normal but nothing is logged to the beholder plugin. Looking at stdout I see two sessions being created one after the other, so it would appear that when you create a tf.estimator.Estimator classifier it spins up its own session after terminating any existing sessions.

Does anyone have any ideas?


Solution

  • Edited post:

    This is a problem with old tensorflow versions. Fortunately, the issue is fixed in tensorflow version 1.9! The code below uses Beholder with tf.estimator.Estimator. It produced the same error as you mention with an older version, but everything works perfectly in version 1.9!

    from capser_7_model_fn import *
    from tensorflow.python import debug as tf_debug
    from tensorflow.python.training import basic_session_run_hooks
    from tensorboard.plugins.beholder import Beholder
    from tensorboard.plugins.beholder import BeholderHook
    import logging
    
    # create estimator for model (the model is described in capser_7_model_fn)
    capser = tf.estimator.Estimator(model_fn=model_fn, params={'model_batch_size': batch_size}, model_dir=LOGDIR)
    
    # train model
    logging.getLogger().setLevel(logging.INFO)  # to show info about training progress in the terminal
    beholder = Beholder(LOGDIR)
    beholder_hook = BeholderHook(LOGDIR)
    capser.train(input_fn=train_input_fn, steps=n_steps, hooks=[beholder_hook])
    

    Another aspect is that I need to specify exactly the same LOGDIR for the summary writer, the tensorboard command line call and the BeholderHook. Before, in order to compare different runs of my model, I wrote summaries for different runs in LOGDIR/run_1, then LOGDIR/run_2, etc. i.e.:

    capser = tf.estimator.Estimator(model_fn=model_fn, params={'model_batch_size': batch_size}, model_dir=LOGDIR/run_n)
    

    and I used

    tensorboard -logdir=LOGDIR
    

    to launch tensorboard and I used

    beholder_hook = BeholderHook(LOGDIR)
    

    to write beholder data. In that case, beholder did not find the data it needed. What I needed to do was to specify exactly the same LOGDIR for everything. I.e., in the code:

    capser = tf.estimator.Estimator(model_fn=model_fn, params={'model_batch_size': batch_size}, model_dir=LOGDIR+'/run_n')
    beholder_hook = BeholderHook(LOGDIR+'/run_n')
    

    And to launch tensorboard in the terminal:

    tensorboard -logdir=LOGDIR+'/run_n'
    

    Hope that helps.