Strongly increasing memory consumption when using ELMo from Tensorflow-Hub

I am currently trying to compare the similarity of millions of documents. For a first test on a CPU I reduced them to around 50 characters each and try to get the ELMo Embedding for 10 of them at a time like this:

ELMO = "https://tfhub.dev/google/elmo/2"
for row in file:
    split = row.split(";", 1)
    if len(split) > 1:
        text = split[1].replace("\n", "")
            texts.append(text[:50])
    if i == 300:
        break
    if i % 10 == 0:
        elmo = hub.Module(ELMO, trainable=False)
                 executable = elmo(
                 texts,
                 signature="default",
                 as_dict=True)["elmo"]

    vectors = execute(executable)
    texts = []
    i += 1

However, even with this small example, after around 300 sentences (and not even saving the vectors) the program consumes up to 12GB of RAM. Is this a know issue (the other issues I found suggest something similar, but not quite that extreme) or did I make a mistake?

Solution

This is for TensorFlow 1.x without Eager mode, I suppose (or else the use of hub.Module would likely hit bigger problems).

In that programming model, you need to first express your computation in a TensorFlow graph, and then execute that graph repeatedly for each batch of data.

Constructing the module with hub.Module() and applying it to map an input tensor to an output tensor are both parts of graph building and should happen only once.
The loop over the input data should merely call session.run() to feed input and fetch output data from the fixed graph.

Fortunately, there is already a utility function to do all this for you:

import numpy as np
import tensorflow_hub as hub

# For demo use only. Extend to your actual I/O needs as you see fit.
inputs = (x for x in ["hello world", "quick brown fox"])

with hub.eval_function_for_module("https://tfhub.dev/google/elmo/2") as f:
  for pystr in inputs:
    batch_in = np.array([pystr])
    batch_out = f(batch_in)
    print(pystr, "--->", batch_out[0])

What this does for you in terms of raw TensorFlow is roughly this:

module = Module(ELMO_OR_WHATEVER)
tensor_in = tf.placeholder(tf.string, shape=[None])  # As befits `module`.
tensor_out = module(tensor_in)

# This kind of session handles init ops for you.
with tf.train.SingularMonitoredSession() as sess:
  for pystr in inputs:
    batch_in = np.array([pystr])
    batch_out = sess.run(tensor_out, feed_dict={tensor_in: batch_in}
    print(pystr, "--->", batch_out[0])

If your needs are too complex for with hub.eval_function_for_module ..., you could build out this more explicit example.

Notice how the hub.Module is neither constructed nor called in the loop.

PS: Tired of worrying about building graphs vs running sessions? Then TF2 and eager execution are for you. Check out https://colab.research.google.com/github/tensorflow/hub/blob/master/examples/colab/tf2_text_classification.ipynb