I am new to PyTorch and trying to reproduce the project: https://github.com/eXascaleInfolab/ActiveLink
However, errors occur in the feedforward()
which has been bothering me for days, here is part of the code (for complete code of the model, see https://github.com/eXascaleInfolab/ActiveLink/blob/master/models.py please):
def forward(self, e1, rel, batch_size=None, weights=None):
......
e1_embedded = self.emb_e(e1).view(-1, 1, 10, 20)
rel_embedded = self.emb_rel(rel).view(-1, 1, 10, 20)
stacked_inputs = torch.cat([e1_embedded, rel_embedded], 2) # out: (128L, 1L, 20L, 20L)
That gives me the error (I am using GPU):
THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCTensorMath.cu line=196 error=710 : device-side assert triggered
Traceback (most recent call last):
File "main.py", line 147, in <module>
main()
File "main.py", line 136, in main
model = run_meta_incremental(config, model, train_batcher, test_rank_batcher)
File "/home/yonghui/yt/meta_incr_training.py", line 158, in run_meta_incremental
g = run_inner(config, model, task)
File "/home/yonghui/yt/meta_incr_training.py", line 120, in run_inner
pred = model.forward(e1, rel)
File "/home/yonghui/yt/models.py", line 136, in forward
stacked_inputs = torch.cat([e1_embedded, rel_embedded], 2)
RuntimeError: cuda runtime error (710) : device-side assert triggered at /pytorch/aten/src/THC/generic/THCTensorMath.cu:196
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [189,0,0], thread: [0,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [189,0,0], thread: [1,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [189,0,0], thread: [2,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [189,0,0], thread: [3,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [189,0,0], thread: [4,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [189,0,0], thread: [5,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [189,0,0], thread: [6,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [189,0,0], thread: [7,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
I use Debugger in an attempt to find out where goes wrong:
Before e1
and rel
are embedded, they are both tensors in int64
with the shape of torch.Size([128, 1])
.
e1
can be embedded as normal, converting into torch.float32
and torch.Size([128, 1, 10, 20])
. However, after rel
passed the embedding layer of emb_rel
, Debugger shows all tenros as Unable to get repr for <class 'torch.Tensor'>
.
What's going on? How can I fix that? Thank you for any possible help!!
This issue is solved by using the debugger and checking the input tensor.
After checking the tensors before embedding, I find that some elements exceed the range, especially for the case where the index starting from 0.