I am doing word-level language modelling with a vanilla rnn, I am able to train the model but for some weird reasons I am not able to get any samples/predictions from the model; here is the relevant part of the code:
train_set_x, train_set_y, voc = load_data(dataset, vocab, vocab_enc) # just load all data as shared variables
index = T.lscalar('index')
x = T.fmatrix('x')
y = T.ivector('y')
n_x = len(vocab)
n_h = 100
n_y = len(vocab)
rnn = Rnn(input=x, input_dim=n_x, hidden_dim=n_h, output_dim=n_y)
cost = rnn.negative_log_likelihood(y)
updates = get_optimizer(optimizer, cost, rnn.params, learning_rate)
train_model = theano.function(
inputs=[index],
outputs=cost,
givens={
x: train_set_x[index],
y: train_set_y[index]
},
updates=updates
)
predict_model = theano.function(
inputs=[index],
outputs=rnn.y,
givens={
x: voc[index]
}
)
sampling_freq = 2
sample_length = 10
n_train_examples = train_set_x.get_value(borrow=True).shape[0]
train_cost = 0.
for i in xrange(n_train_examples):
train_cost += train_model(i)
train_cost /= n_train_examples
if i % sampling_freq == 0:
# sample from the model
seed = randint(0, len(vocab)-1)
idxes = []
for j in xrange(sample_length):
p = predict_model(seed)
seed = p
idxes.append(p)
# sample = ''.join(ix_to_words[ix] for ix in idxes)
# print(sample)
I get the error: "TypeError: ('Bad input argument to theano function with name "train.py:94" at index 0(0-based)', 'Wrong number of dimensions: expected 0, got 1 with shape (1,).')"
Now this corresponds to the following line (in the predict_model):
givens={ x: voc[index] }
Even after spending hours I am not able to comprehend how could there be a dimension mis-match when:
train_set_x has shape: (42, 4, 109)
voc has shape: (109, 1, 109)
And when I do train_set_x[index], I am getting (4, 109) which 'x' Tensor of type fmatrix can hold (this is what happens in train_model) but when I do voc[index], I am getting (1, 109), which is also a matrix but 'x' cannot hold this, why ? !
Any help will be much appreciated.
Thanks !
The error message refers to the definition of the whole Theano function named predict_model
, not the specific line where the substitution with givens
occurs.
The issue seems to be that predict_model
gets called with an argument that is a vector of length 1 instead of a scalar. The initial seed
sampled from randint
is actually a scalar, but I would guess that the output p
of predict_model(seed)
is a vector and not a scalar.
In that case, you could either return rnn.y[0]
in predict_model
, or replace seed = p
with seed = p[0]
in the loop over j
.