Search code examples
pythonkerasconv-neural-networklstmsentiment-analysis

Out of memory training CNN-LSTM with GPU in Jupyter notebook


Currently, I want to compile my hybrid CNN-LSTM model for sentiment analysis, but I got the following error

OOM when allocating tensor with shape[9051,300] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:RandomUniform]

This my GPU list, where I want to use the RTX one:

[name: "/device:CPU:0"
 device_type: "CPU"
 memory_limit: 268435456
 locality {
 }
 incarnation: 13057500645716466504,
 name: "/device:GPU:0"
 device_type: "GPU"
 memory_limit: 44957696
 locality {
   bus_id: 1
   links {
   }
 }
 incarnation: 6095838710984840352
 physical_device_desc: "device: 0, name: NVIDIA RTX A6000, pci bus id: 0000:05:00.0, compute capability: 8.6",
 name: "/device:GPU:1"
 device_type: "GPU"
 memory_limit: 10648354816
 locality {
   bus_id: 1
   links {
   }
 }
 incarnation: 10826802477734196135
 physical_device_desc: "device: 1, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:09:00.0, compute capability: 6.1"]

this is my code:

# Build hybrid CNN-LSTM model
def build_cnn_lstm_model(num_words, embedding_vector_size, embedding_matrix, max_sequence_length):
    # Input layer
    input_layer = Input(shape=(max_sequence_length,))

    # Word embedding
    embedding_layer = Embedding(input_dim=num_words,
                              output_dim=embedding_vector_size,
                              weights=[embedding_matrix],
                              input_length=max_sequence_length)(input_layer)

    # CNN model
    # Bigrams extraction
    bigrams_convolution_layer = Conv1D(filters=256,
                                     kernel_size=2,
                                     strides=1,
                                     padding='valid',
                                     activation='relu')(embedding_layer)
    bigrams_max_pooling_layer = MaxPooling1D(pool_size=2,
                                           strides=1,
                                           padding='valid')(bigrams_convolution_layer)

    # Trigrams extraction
    trigrams_convolution_layer = Conv1D(filters=256,
                                     kernel_size=3,
                                     strides=1,
                                     padding='valid',
                                     activation='relu')(bigrams_max_pooling_layer)
    trigrams_max_pooling_layer = MaxPooling1D(pool_size=2,
                                           strides=1,
                                           padding='valid')(trigrams_convolution_layer)

    # Fourgrams extraction
    fourgrams_convolution_layer = Conv1D(filters=256,
                                      kernel_size=4,
                                      strides=1,
                                      padding='valid',
                                      activation='relu')(trigrams_max_pooling_layer)
    fourgrams_max_pooling_layer = MaxPooling1D(pool_size=2,
                                            strides=1,
                                            padding='valid')(fourgrams_convolution_layer)

    # Fivegrams extraction
    fivegrams_convolution_layer = Conv1D(filters=256,
                                      kernel_size=5,
                                      strides=1,
                                      padding='valid',
                                      activation='relu')(fourgrams_max_pooling_layer)
    fivegrams_max_pooling_layer = MaxPooling1D(pool_size=2,
                                            strides=1,
                                            padding='valid')(fivegrams_convolution_layer)

    # Dropout layer
    dropout_layer = Dropout(rate=0.5)(bigrams_max_pooling_layer)

    # LSTM model
    lstm_layer = LSTM(units=128,
                      activation='tanh',
                      return_sequences=False,
                      dropout=0.3,
                      return_state=False)(dropout_layer)

    # Batch normalization layer
    batch_norm_layer = BatchNormalization()(lstm_layer)

    # Classifier model
    dense_layer = Dense(units=10, activation='relu') (lstm_layer)
    output_layer = Dense(units=3, activation='softmax')(dense_layer)

    cnn_lstm_model = Model(inputs=input_layer, outputs=output_layer)

    return cnn_lstm_model

with tf.device('/device:GPU:0'):
    sinovac_cnn_lstm_model = build_cnn_lstm_model(SINOVAC_NUM_WORDS, 
                                                  SINOVAC_EMBEDDING_VECTOR_SIZE,
                                                  SINOVAC_EMBEDDING_MATRIX,
                                                  SINOVAC_MAX_SEQUENCE)
    sinovac_cnn_lstm_model.summary()

    sinovac_cnn_lstm_model.compile(loss='categorical_crossentropy',
                                   optimizer=Adam(lr=0.001),
                                   metrics=['accuracy'])

Strangely, I used the GPU:1 that is the GTX one, it worked The GTX 1080Ti one is obviously has fewer memory than the RTX A6000 one, but why it produced Out Of Memory error when compiled and trained with the RTX A6000 ? Any solution?


Solution

  • Even though the physical_device_desc calls it device: 0, it is the name under that, name: "/device:GPU:1" entry that is used. Therefore even though the 1080Ti calls itself device: 1 in the physical_device_desc field, it is actually `"/device:GPU:0".

    In other words, use with tf.device('/device:GPU:0'): to use the 1080Ti, and with tf.device('/device:GPU:1'): to get the A6000.

    That sounds potentially fragile, but I just had a poke around in Tensorflow docs, and there seems no built-in function to identify a GPU by model name. So you'd need to run through the list of devices, and match against that physical device name (or simply find the the one with most memory) to get the "GPU:nnn" name you need.