TFLearn regression, shape incompatibility in loss calculation

I am working with protein sequences. My goal is to create a convolutional network which will predict three angles for each amino acid in the protein. I'm having trouble debugging a TFLearn DNN model that requires a reshape operation.

The input data describes (currently) 25 proteins of varying lengths. To use Tensors I need to have uniform dimensions, so I pad the empty input cells with zeros. Each amino acid is represented by a 4-dimensional code. The details of that are probably unimportant, other than to help you understand the shapes of the Tensors.

The output of the DNN is six numbers, representing the sines and cosines of three angles. To create ordered pairs, the DNN graph reshapes a [..., 6] Tensor to [..., 3, 2]. My target data is encoded the same way. I calculate the loss using cosine distance.

I built a non-convolutional DNN which showed good initial learning behavior which is very similar to the code I will post here. But that model treated three adjacent amino acids in isolation. I want to treat each protein as a unit -- with sliding windows 3 amino acids wide at first, and eventually larger.

Now that I am converting to a convolutional model, I can't seem to get the shapes to match. Here are the working portions of my code:

import tensorflow as tf
import tflearn as tfl

from protein import ProteinDatabase   # don't worry about its details

def backbone_angle_distance(predict, actual):
    with tf.name_scope("BackboneAngleDistance"):
        actual = tfl.reshape(actual, [-1,3,2])
        # Supply the -1 argument for axis that TFLearn can't pass
        loss = tf.losses.cosine_distance(predict, actual, -1, 
               reduction=tf.losses.Reduction.MEAN)
        return loss

# Training data
database = ProteinDatabase("./data")
inp, tgt = database.training_arrays()

# DNN model, convolution only in topmost layer for now
net = tfl.input_data(shape=[None, None, 4]) 
net = tfl.conv_1d(net, 24, 3)
net = tfl.conv_1d(net, 12, 1)
net = tfl.conv_1d(net, 6, 1)
net = tfl.reshape(net, [-1,3,2]) 
net = tf.nn.l2_normalize(net, dim=2)
net = tfl.regression(net, optimizer="sgd", learning_rate=0.1, \
                     loss=backbone_angle_distance)
model = tfl.DNN(net)

# Generate a prediction.  Compare shapes for compatibility.
out = model.predict(inp)
print("\ninp : {}, shape = {}".format(type(inp), inp.shape))
print("out : {}, shape = {}".format(type(out), out.shape))
print("tgt : {}, shape = {}".format(type(tgt), tgt.shape))
print("tgt shape, if flattened by one dimension = {}\n".\
      format(tgt.reshape([-1,3,2]).shape))

The output at this point is:

inp : <class 'numpy.ndarray'>, shape = (25, 543, 4)
out : <class 'numpy.ndarray'>, shape = (13575, 3, 2)
tgt : <class 'numpy.ndarray'>, shape = (25, 543, 3, 2)
tgt shape, if flattened by one dimension = (13575, 3, 2)

So if I reshape the 4D Tensor tgt, flattening the outermost dimension, out and tgt should match. Since TFLearn's code makes the batches, I try to intercept and reshape the Tensor actual in the first line of backbone_angle_distance(), my custom loss function.

If I add a few lines to attempt model fitting as follows:

e, b = 1, 5
model.fit(inp, tgt, n_epoch=e, batch_size=b, validation_set=0.2, show_metric=True)

I get the following extra output and error:

---------------------------------
Run id: EEG6JW
Log directory: /tmp/tflearn_logs/
---------------------------------
Training samples: 20
Validation samples: 5
--
--
Traceback (most recent call last):
  File "exp54.py", line 252, in <module>
    model.fit(inp, tgt, n_epoch=e, batch_size=b, validation_set=0.2, show_metric=True)
  File "/usr/local/lib/python3.5/dist-packages/tflearn/models/dnn.py", line 216, in fit
    callbacks=callbacks)
  File "/usr/local/lib/python3.5/dist-packages/tflearn/helpers/trainer.py", line 339, in fit
    show_metric)
  File "/usr/local/lib/python3.5/dist-packages/tflearn/helpers/trainer.py", line 818, in _train
    feed_batch)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 789, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 975, in _run
    % (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (5, 543, 3, 2) for Tensor 'TargetsData/Y:0', which has shape '(?, 3, 2)'

Where in my code am I SPECIFYING that TargetsData/Y:0 has shape (?, 3, 2)? I know it won't be. According to the traceback, I never actually seem to reach my reshape operation in backbone_angle_distance().

Any advice is appreciated, thanks!

Solution

Well, it looks like I am answering my own question.

I tried various permutations of what Geert was suggesting, and I couldn't make anything work. When I was building the non-convolutional network that preceded the one I am discussing here, attempting to reshape the training data to [-1,3,2] was appropriate. Eventually, I concluded that TFLearn will not let me flatten the 4D Tensor that I need for the CNN in the loss function. I need to add one dimension, as before. But instead of preserving one dimension (which is what -1 does), now I have to preserve TWO.

Here is my solution.

1) Eliminate the reshape from the loss function:

def backbone_angle_distance(predict, actual):
    with tf.name_scope("BackboneAngleDistance"):
        # Supply the -1 argument for axis that TFLearn can't pass
        loss = tf.losses.cosine_distance(predict, actual, -1, 
               reduction=tf.losses.Reduction.MEAN)
        return loss

2) Introduce the variable shp, which explicitly stores the dimensions of the 4D input Tensor:

net = tfl.input_data(shape=[None, None, 4])
shp = tf.shape(net)  # <--- (new)
net = tfl.conv_1d(net, 24, window) 
net = tfl.conv_1d(net, 12, 1)
net = tfl.conv_1d(net, 6, 1)
net = tfl.reshape(net, [shp[0], shp[1], 3, 2])  # <--- (new)
net = tf.nn.l2_normalize(net, dim=2)
net = tfl.regression(net, optimizer="sgd", learning_rate=0.1, \
                     loss=backbone_angle_distance_1)
model = tfl.DNN(net)

The shape-related errors that I had earlier are now gone. But if anyone is still following this, I have further questions.

a) Did I do it "right"? This algorithm will probably never be trained on a distributed system, as the data set that I have is too small to bother. However, it is my understanding that anything that a TensorFlow graph uses that is not itself a TensorFlow object has the potential to break any parallelizing optimizations that could be performed. Is shp a proper TensorFlow object? How about its elements, which I obtain by slicing operations?

b) If I were working in Numpy, this looks like a job for Python's ellipsis operator. I even unconsciously wrote my initial description of the Tensor shapes at the top of this discussion using ellipses. Does TensorFlow understand the ellipsis? It could be useful.