I am training a 3D U-net to do multi-label (4 classes) semantic segmentation. Training with model.fit() runs just fine with no errors and I see that the model is learning. However, when I try to run model.predict() I get the following error:
85/85 - 56s
2022-12-22 18:26:24.265485: F tensorflow/core/kernels/concat_lib_gpu_impl.cu.cc:165] Non-OK-status: GpuLaunchKernel( concat_variable_kernel<T, IntType, true>, config.block_count, config.thread_per_block, smem_usage, gpu_device.stream(), input_ptrs, output_scan, static_cast<IntType>(output->dimension(0)), static_cast<IntType>(output->dimension(1)), output->data()) status: Internal: invalid configuration argument
/cm/local/apps/slurm/var/spool/job5510720/slurm_script: line 14: 1945 Aborted
Here's a simplified and abbreviated version of my code:
import tensorflow as tf
from keras.models import Model
from keras.models import load_model
from tensorflow.keras.optimizers import Adam, SGD
from keras.layers import Conv3D, MaxPooling3D, Conv3DTranspose, UpSampling3D, Concatenate
def unet(input_shape,filters,kernel,model_name):
strides_1 = (1,1,1)
strides_2 = (2,2,2)
ins = Input(shape=input_shape,name='input_1')
encode1a = Conv3D(filters=filters, kernel_size=kernel, activation='relu', padding='same', name='encode1a', strides=strides_1)(x)
encode1b = Conv3D(filters=filters, kernel_size=kernel, activation='relu', padding='same', name='encode1b', strides=strides_1)(encode1a)
pool1 = MaxPooling3D(pool_size=(2, 2, 2), padding='same', name='pool1')(encode1b)
encode2a = Conv3D(filters=2*filters, kernel_size=kernel, activation='relu', padding='same', name='encode2a', strides=strides_1)(pool1)
encode2b = Conv3D(filters=2*filters, kernel_size=kernel, activation='relu', padding='same', name='encode2b', strides=strides_1)(encode2a)
pool2 = MaxPooling3D(pool_size=(2, 2, 2), padding='same', name='pool2')(encode2b)
encode3a = Conv3D(filters=4*filters, kernel_size=kernel, activation='relu', padding='same', name='encode3a', strides=strides_1)(pool2)
encode3b = Conv3D(filters=4*filters, kernel_size=kernel, activation='relu', padding='same', name='encode3b', strides=strides_1)(encode3a)
pool3 = MaxPooling3D(pool_size=(2, 2, 2), padding='same', name='pool3')(encode3b)
# Bottleneck
#--------------------------
bottom_a = Conv3D(filters=8*filters, kernel_size=kernel, activation='relu', padding='same')(pool3)
bottom_b = Conv3D(filters=8*filters, kernel_size=kernel, activation='relu', padding='same')(bottom_a)
# Decoding
#--------------------------
up2 = Concatenate(axis=4)([Conv3DTranspose(filters=4*filters, kernel_size=(2,2,2), strides=strides_2, padding='same')(bottom_b), encode3b])
decode2a = Conv3D(filters=4*filters, kernel_size=kernel, activation='relu', padding='same',name='decode1a')(up2)
decode2b = Conv3D(filters=4*filters, kernel_size=kernel, activation='relu', padding='same',name='decode1b')(decode2a)
up3 = Concatenate(axis=4)([Conv3DTranspose(filters=2*filters, kernel_size=(2,2,2), strides=strides_2, padding='same')(decode2b), encode2b])
decode1a = Conv3D(filters=2*filters, kernel_size=kernel, activation='relu', padding='same',name='decode2a')(up3)
decode1b = Conv3D(filters=2*filters, kernel_size=kernel, activation='relu', padding='same',name='decode2b')(decode1a)
up4 = Concatenate(axis=4)([Conv3DTranspose(filters=filters, kernel_size=(2,2,2), strides=strides_2, padding='same')(decode1b), encode1b])
decode0a = Conv3D(filters=filters, kernel_size=kernel, activation='relu', padding='same',name='decode3a')(up4)
decode0b = Conv3D(filters=filters, kernel_size=kernel, activation='relu', padding='same',name='decode3b')(decode0a)
# Output
flatten = Convolution3D(filters=4, kernel_size=(1,1,1), activation='softmax')(decode0b)
model = Model(inputs=ins, outputs=flatten, name=model_name)
return model
FILTERS = 32
KERNEL = (3,3,3)
MODEL_NAME = 'multi-unet-test'
LR = 3e-3
strategy = tf.distribute.MirroredStrategy()
print('Number of devices: {}'.format(strategy.num_replicas_in_sync))
with strategy.scope():
model = nets.unet((None,None,None,1),FILTERS,KERNEL,model_name=MODEL_NAME)
model.compile(optimizer=nets.Adam(lr=LR),loss=tf.keras.losses.SparseCategoricalCrossentropy(),metrics=['accuracy'])
model.summary()
X_train, Y_train = load_dataset_all(FILE_DEN,FILE_MSK,SUBGRID)
# this is a function for loading input and mask fields
# outputs shapes of [256,128,128,128,4]
history = model.fit(X_train, Y_train, batch_size = 4, epochs = 50, verbose = 2, shuffle = True, validation_split = 0.2)
model.save(MODEL_NAME)
# Load and predict
# this is actually in another script but I'm putting this all in one go:
model = load_model(MODEL_NAME)
model.compile(loss=model.loss,optimizer=model.optimizer,metrics=['accuracy'])
# load test data:
X_test = load_dataset()
Y_test = model.predict(X_test, batch_size = 4, verbose = 2)
After some Googling and looking at other questions on stack overflow, people seem to suggest two solutions: adjusting the batch size so that the number of samples is divisible by it, and switching to different versions of TF/CUDA. Originally my X_test had a shape of [343,128,128,128,4] but I chopped off 3 samples to get it to [340,128,128,128,4] so that it's divisible by my batch size of 4.
The first test was using tf version 2.4.1 and CUDA version 11.6. I tried the same code on Colab with tf version 2.9.2 and CUDA version 11.2 and got the same error, so I doubt that's the problem.
Any advice or help would be greatly appreciated. Let me know if there's any other information I can provide.
Thank you!!!
I had the exact same problem and it's now gone. I changed a few things and at some point the error message changed to "Split on GPU requires input size < max int32", so I'm not exactly sure what the problem was. Just wanted to give you a list of things I have changed, maybe one of it helps:
Generally I couldn't and still can't make sense of the error message ("invalid configuration argument") but think it's probably a memory problem? My model is even smaller than yours but our arrays are huge (my input is 128x128x128 and labels 512x512x512).
Hope that helps at least a bit.