I have a strange error that I don't manage to understand when compiling a scan operator in Theano. When outputs_info is initialized with a last dimension equal to one, I get this error:
TypeError: ('The following error happened while compiling the node', forall_inplace,cpu,
scan_fn}(TensorConstant{4}, IncSubtensor{InplaceSet;:int64:}.0, <TensorType(float32, vector)>),
'\n', "Inconsistency in the inner graph of scan 'scan_fn' : an input and an output are
associated with the same recurrent state and should have the same type but have type
'TensorType(float32, (True,))' and 'TensorType(float32, vector)' respectively.")
while I don't get any error if this dimension is set to anything greater than one.
This error happens on both gpu and cpu target, with theano 0.7, 0.8.0 and 0.8.2.
Here is a piece of code to reproduce the error:
import theano
import theano.tensor as T
import numpy as np
def rec_fun( prev_output, bias):
return prev_output + bias
n_steps = 4
# with state_size>1, compilation runs smoothly
state_size = 2
bias = theano.shared(np.ones((state_size),dtype=theano.config.floatX))
(outputs, updates) = theano.scan( fn=rec_fun,
sequences=[],
outputs_info=T.zeros([state_size,]),
non_sequences=[bias],
n_steps=n_steps
)
print outputs.eval()
# with state_size==1, compilation fails
state_size = 1
bias = theano.shared(np.ones((state_size),dtype=theano.config.floatX))
(outputs, updates) = theano.scan( fn=rec_fun,
sequences=[],
outputs_info=T.zeros([state_size,]),
non_sequences=[bias],
n_steps=n_steps
)
# compilation fails here
print outputs.eval()
The compilation has thus different behaviors depending on the "state_size". Is there a workaround to handle both case state_size==1 and state_size>1?
Changing
outputs_info=T.zeros([state_size,])
to
outputs_info=T.zeros_like(bias)
makes it work properly for the case of state_size == 1
.
Minor explanation and different solution
So I am noticing this crucial difference between the two cases. Add these line of code exactly after the bias declaration line in both cases.
bias = ....
print bias.broadcastable
print T.zeros([state_size,]).broadcastable
The results are
for the first case where your code works
(False,)
(False,)
And for the second case where it seems to break down
(False,)
(True,)
So what happened is that when you added the two tensors of the same dimensions (bias and T.zeros) but with different broadcastable patterns, the pattern that the result inherited was the one from the bias. This ended up causing the misidentification from theano that they are not the same type.
T.zeros_like works because it uses the bias
variable to generate the zeros tensor.
Another way to fix your problem is to change the broadcasting pattern like so
outputs_info=T.patternbroadcast(T.zeros([state_size,]), (False,)),