Search code examples
pythontensorflowlstm

cuda lstm unspecified launch failure error


I have Nvidia GTX 1050 card, my cuda version is 10.1 and I have cuDNN 7.6.5, whenever I try to run LSTM cells, bunch of errors are raised

Here is my code:

model = Sequential()
model.add(LSTM(64, input_shape=(x_train.shape[1], x_train.shape[2]), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(64, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(64, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(32))
model.add(Dropout(0.2))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(optimizer='adam', loss='mse')


model.fit(x_train, y, epochs=5, batch_size=16)

And here is my tensorflow version and full Traceback:

In [2]: tf.__version__
Out[2]: '2.3.0'

Traceback:

 Epoch 1/100
    2020-09-04 15:27:30.033120: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
    2020-09-04 15:27:31.436246: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
     27255/261088 [==>...........................] - ETA: 51:45 - loss: 0.01302020-09-04 15:33:38.188521: E tensorflow/stream_executor/dnn.cc:616] CUDNN_STATUS_INTERNAL_ERROR
    in tensorflow/stream_executor/cuda/cuda_dnn.cc(1892): 'cudnnRNNForwardTraining( cudnn.handle(), rnn_desc.handle(), model_dims.max_seq_length, input_desc.handles(), input_data.opaque(), input_h_desc.handle(), input_h_data.opaque(), input_c_desc.handle(), input_c_data.opaque(), rnn_desc.params_handle(), params.opaque(), output_desc.handles(), output_data->opaque(), output_h_desc.handle(), output_h_data->opaque(), output_c_desc.handle(), output_c_data->opaque(), workspace.opaque(), workspace.size(), reserve_space.opaque(), reserve_space.size())'
    2020-09-04 15:33:38.191709: E tensorflow/stream_executor/cuda/cuda_event.cc:29] Error polling for event status: failed to query event: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
    2020-09-04 15:33:38.273883: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:220] Unexpected Event status: 1
    2020-09-04 15:33:38.256027: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQU

Solution

  • How much data do you sent into your model at once? It seems to me, that you need to adjust your batch_size. To me, it looks like you feed too much data into your gpu at once, causing cuda to crash. How big are your sequenced? How much is the memory allocation of your gpu? However, without more information about the data and whether cuda and cudnn is properly installed, providing a more clear solution is difficult