I'm currently becoming more and more desperate concerning my tensorflow project. It took many hours installing tensorflow until I figured out that PyCharm, Python 3.7 and TF 2.x are somehow not compatible. Now it is running, but I get a really unspecific CuDNN error after many epochs of training. Do you know if my code is wrong or if there is e.g. an installation error? Could you please hint me a direction? I also didn't find anything specific with searching.
My setup [in brackets what I also tried]:
This error occurs after ~3h of training. In other cases (or parametrisations of the net) the error occurs much earlier. Here you can see the full output of the code sniplet below:
C:\Users\Fhnx\.virtualenvs\Processing-TA9ofq3q\Scripts\python.exe C:/Users/Fhnx/.../playground/AI_Predictor_Test.py
2020-05-08 11:47:25.924424: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
Starting training sweep with Epochs: 10000, LRstart: 0.01, LRend: 5e-05
2020-05-08 11:47:27.887135: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-05-08 11:47:27.912998: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2070 SUPER computeCapability: 7.5
coreClock: 1.815GHz coreCount: 40 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2020-05-08 11:47:27.913212: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-05-08 11:47:27.921203: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-05-08 11:47:27.930115: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-05-08 11:47:27.932760: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-05-08 11:47:27.944938: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-05-08 11:47:27.952321: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-05-08 11:47:27.960042: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-05-08 11:47:27.960698: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-05-08 11:47:27.961058: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-05-08 11:47:27.969636: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2df4e1dcd00 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-05-08 11:47:27.969831: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-05-08 11:47:27.970579: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2070 SUPER computeCapability: 7.5
coreClock: 1.815GHz coreCount: 40 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2020-05-08 11:47:27.970964: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-05-08 11:47:27.971208: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-05-08 11:47:27.971389: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-05-08 11:47:27.971602: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-05-08 11:47:27.971839: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-05-08 11:47:27.972112: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-05-08 11:47:27.972324: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-05-08 11:47:27.973322: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-05-08 11:47:28.530960: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-05-08 11:47:28.531109: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2020-05-08 11:47:28.531180: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
2020-05-08 11:47:28.532337: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6213 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070 SUPER, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-05-08 11:47:28.534819: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2df7aeb31a0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-05-08 11:47:28.534946: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce RTX 2070 SUPER, Compute Capability 7.5
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 22)] 0
__________________________________________________________________________________________________
tf_op_layer_ExpandDims (TensorF [(None, 22, 1)] 0 input_1[0][0]
__________________________________________________________________________________________________
dense (Dense) (None, 22, 64) 128 tf_op_layer_ExpandDims[0][0]
__________________________________________________________________________________________________
dense_3 (Dense) (None, 22, 64) 128 tf_op_layer_ExpandDims[0][0]
__________________________________________________________________________________________________
dense_6 (Dense) (None, 22, 64) 128 tf_op_layer_ExpandDims[0][0]
__________________________________________________________________________________________________
dense_9 (Dense) (None, 22, 64) 128 tf_op_layer_ExpandDims[0][0]
__________________________________________________________________________________________________
dense_12 (Dense) (None, 22, 64) 128 tf_op_layer_ExpandDims[0][0]
__________________________________________________________________________________________________
dense_15 (Dense) (None, 22, 64) 128 tf_op_layer_ExpandDims[0][0]
__________________________________________________________________________________________________
gaussian_dropout (GaussianDropo (None, 22, 64) 0 dense[0][0]
__________________________________________________________________________________________________
gaussian_dropout_2 (GaussianDro (None, 22, 64) 0 dense_3[0][0]
__________________________________________________________________________________________________
gaussian_dropout_4 (GaussianDro (None, 22, 64) 0 dense_6[0][0]
__________________________________________________________________________________________________
gaussian_dropout_6 (GaussianDro (None, 22, 64) 0 dense_9[0][0]
__________________________________________________________________________________________________
gaussian_dropout_8 (GaussianDro (None, 22, 64) 0 dense_12[0][0]
__________________________________________________________________________________________________
gaussian_dropout_10 (GaussianDr (None, 22, 64) 0 dense_15[0][0]
__________________________________________________________________________________________________
bidirectional (Bidirectional) (None, 22, 16) 4672 gaussian_dropout[0][0]
__________________________________________________________________________________________________
bidirectional_2 (Bidirectional) (None, 22, 16) 4672 gaussian_dropout_2[0][0]
__________________________________________________________________________________________________
bidirectional_4 (Bidirectional) (None, 22, 16) 4672 gaussian_dropout_4[0][0]
__________________________________________________________________________________________________
bidirectional_6 (Bidirectional) (None, 22, 16) 4672 gaussian_dropout_6[0][0]
__________________________________________________________________________________________________
bidirectional_8 (Bidirectional) (None, 22, 16) 4672 gaussian_dropout_8[0][0]
__________________________________________________________________________________________________
bidirectional_10 (Bidirectional (None, 22, 16) 4672 gaussian_dropout_10[0][0]
__________________________________________________________________________________________________
bidirectional_1 (Bidirectional) (None, 22, 16) 1600 bidirectional[0][0]
__________________________________________________________________________________________________
bidirectional_3 (Bidirectional) (None, 22, 16) 1600 bidirectional_2[0][0]
__________________________________________________________________________________________________
bidirectional_5 (Bidirectional) (None, 22, 16) 1600 bidirectional_4[0][0]
__________________________________________________________________________________________________
bidirectional_7 (Bidirectional) (None, 22, 16) 1600 bidirectional_6[0][0]
__________________________________________________________________________________________________
bidirectional_9 (Bidirectional) (None, 22, 16) 1600 bidirectional_8[0][0]
__________________________________________________________________________________________________
bidirectional_11 (Bidirectional (None, 22, 16) 1600 bidirectional_10[0][0]
__________________________________________________________________________________________________
conv1d (Conv1D) (None, 20, 13) 1780 bidirectional_1[0][0]
__________________________________________________________________________________________________
conv1d_4 (Conv1D) (None, 20, 13) 1780 bidirectional_3[0][0]
__________________________________________________________________________________________________
conv1d_8 (Conv1D) (None, 20, 13) 1780 bidirectional_5[0][0]
__________________________________________________________________________________________________
conv1d_12 (Conv1D) (None, 20, 13) 1780 bidirectional_7[0][0]
__________________________________________________________________________________________________
conv1d_16 (Conv1D) (None, 20, 13) 1780 bidirectional_9[0][0]
__________________________________________________________________________________________________
conv1d_20 (Conv1D) (None, 20, 13) 1780 bidirectional_11[0][0]
__________________________________________________________________________________________________
conv1d_1 (Conv1D) (None, 20, 10) 1620 conv1d[0][0]
__________________________________________________________________________________________________
conv1d_5 (Conv1D) (None, 20, 10) 1620 conv1d_4[0][0]
__________________________________________________________________________________________________
conv1d_9 (Conv1D) (None, 20, 10) 1620 conv1d_8[0][0]
__________________________________________________________________________________________________
conv1d_13 (Conv1D) (None, 20, 10) 1620 conv1d_12[0][0]
__________________________________________________________________________________________________
conv1d_17 (Conv1D) (None, 20, 10) 1620 conv1d_16[0][0]
__________________________________________________________________________________________________
conv1d_21 (Conv1D) (None, 20, 10) 1620 conv1d_20[0][0]
__________________________________________________________________________________________________
conv1d_2 (Conv1D) (None, 20, 7) 1620 conv1d_1[0][0]
__________________________________________________________________________________________________
conv1d_6 (Conv1D) (None, 20, 7) 1620 conv1d_5[0][0]
__________________________________________________________________________________________________
conv1d_10 (Conv1D) (None, 20, 7) 1620 conv1d_9[0][0]
__________________________________________________________________________________________________
conv1d_14 (Conv1D) (None, 20, 7) 1620 conv1d_13[0][0]
__________________________________________________________________________________________________
conv1d_18 (Conv1D) (None, 20, 7) 1620 conv1d_17[0][0]
__________________________________________________________________________________________________
conv1d_22 (Conv1D) (None, 20, 7) 1620 conv1d_21[0][0]
__________________________________________________________________________________________________
conv1d_3 (Conv1D) (None, 20, 4) 1620 conv1d_2[0][0]
__________________________________________________________________________________________________
conv1d_7 (Conv1D) (None, 20, 4) 1620 conv1d_6[0][0]
__________________________________________________________________________________________________
conv1d_11 (Conv1D) (None, 20, 4) 1620 conv1d_10[0][0]
__________________________________________________________________________________________________
conv1d_15 (Conv1D) (None, 20, 4) 1620 conv1d_14[0][0]
__________________________________________________________________________________________________
conv1d_19 (Conv1D) (None, 20, 4) 1620 conv1d_18[0][0]
__________________________________________________________________________________________________
conv1d_23 (Conv1D) (None, 20, 4) 1620 conv1d_22[0][0]
__________________________________________________________________________________________________
batch_normalization (BatchNorma (None, 20, 4) 16 conv1d_3[0][0]
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 20, 4) 16 conv1d_7[0][0]
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, 20, 4) 16 conv1d_11[0][0]
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, 20, 4) 16 conv1d_15[0][0]
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, 20, 4) 16 conv1d_19[0][0]
__________________________________________________________________________________________________
batch_normalization_5 (BatchNor (None, 20, 4) 16 conv1d_23[0][0]
__________________________________________________________________________________________________
dense_1 (Dense) (None, 20, 128) 640 batch_normalization[0][0]
__________________________________________________________________________________________________
dense_4 (Dense) (None, 20, 128) 640 batch_normalization_1[0][0]
__________________________________________________________________________________________________
dense_7 (Dense) (None, 20, 128) 640 batch_normalization_2[0][0]
__________________________________________________________________________________________________
dense_10 (Dense) (None, 20, 128) 640 batch_normalization_3[0][0]
__________________________________________________________________________________________________
dense_13 (Dense) (None, 20, 128) 640 batch_normalization_4[0][0]
__________________________________________________________________________________________________
dense_16 (Dense) (None, 20, 128) 640 batch_normalization_5[0][0]
__________________________________________________________________________________________________
gaussian_dropout_1 (GaussianDro (None, 20, 128) 0 dense_1[0][0]
__________________________________________________________________________________________________
gaussian_dropout_3 (GaussianDro (None, 20, 128) 0 dense_4[0][0]
__________________________________________________________________________________________________
gaussian_dropout_5 (GaussianDro (None, 20, 128) 0 dense_7[0][0]
__________________________________________________________________________________________________
gaussian_dropout_7 (GaussianDro (None, 20, 128) 0 dense_10[0][0]
__________________________________________________________________________________________________
gaussian_dropout_9 (GaussianDro (None, 20, 128) 0 dense_13[0][0]
__________________________________________________________________________________________________
gaussian_dropout_11 (GaussianDr (None, 20, 128) 0 dense_16[0][0]
__________________________________________________________________________________________________
flatten (Flatten) (None, 2560) 0 gaussian_dropout_1[0][0]
__________________________________________________________________________________________________
flatten_1 (Flatten) (None, 2560) 0 gaussian_dropout_3[0][0]
__________________________________________________________________________________________________
flatten_2 (Flatten) (None, 2560) 0 gaussian_dropout_5[0][0]
__________________________________________________________________________________________________
flatten_3 (Flatten) (None, 2560) 0 gaussian_dropout_7[0][0]
__________________________________________________________________________________________________
flatten_4 (Flatten) (None, 2560) 0 gaussian_dropout_9[0][0]
__________________________________________________________________________________________________
flatten_5 (Flatten) (None, 2560) 0 gaussian_dropout_11[0][0]
__________________________________________________________________________________________________
dense_2 (Dense) (None, 1) 2561 flatten[0][0]
__________________________________________________________________________________________________
dense_5 (Dense) (None, 1) 2561 flatten_1[0][0]
__________________________________________________________________________________________________
dense_8 (Dense) (None, 1) 2561 flatten_2[0][0]
__________________________________________________________________________________________________
dense_11 (Dense) (None, 1) 2561 flatten_3[0][0]
__________________________________________________________________________________________________
dense_14 (Dense) (None, 1) 2561 flatten_4[0][0]
__________________________________________________________________________________________________
dense_17 (Dense) (None, 1) 2561 flatten_5[0][0]
__________________________________________________________________________________________________
concatenate (Concatenate) (None, 6) 0 dense_2[0][0]
dense_5[0][0]
dense_8[0][0]
dense_11[0][0]
dense_14[0][0]
dense_17[0][0]
==================================================================================================
Total params: 97,542
Trainable params: 97,494
Non-trainable params: 48
__________________________________________________________________________________________________
***** Training Net ForkedConvLSTM_D64_LSTM2x8_Conv4x20x4_D1x128_dr0.40 now *****
BatchSize: 2108, NumNetParams: 97542, Feature shape: (500000, 22), Output shape: (500000, 6), In/Out Elem.: 14.0000M with est. size: 448.0000 MB
Epoch 1/10000
2020-05-08 11:47:57.675309: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-05-08 11:47:57.962354: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-05-08 11:47:59.216097: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only
Relying on driver to perform ptx compilation.
Modify $PATH to customize ptxas location.
This message will be only logged once.
238/238 [==============================] - 21s 90ms/step - loss: 0.3145 - val_loss: 0.0846 - lr: 0.0100
Epoch 2/10000
238/238 [==============================] - 15s 62ms/step - loss: 0.0851 - val_loss: 0.0837 - lr: 0.0100
[...]
Epoch 694/10000
238/238 [==============================] - 14s 61ms/step - loss: 0.0833 - val_loss: 0.0836 - lr: 5.0000e-05
Epoch 695/10000
6/238 [..............................] - ETA: 12s - loss: 0.08302020-05-08 14:39:02.141015: E tensorflow/stream_executor/dnn.cc:613] CUDNN_STATUS_INTERNAL_ERROR
in tensorflow/stream_executor/cuda/cuda_dnn.cc(1986): 'cudnnRNNBackwardData( cudnn.handle(), rnn_desc.handle(), model_dims.max_seq_length, output_desc.handles(), output_data.opaque(), output_desc.handles(), output_backprop_data.opaque(), output_h_desc.handle(), output_h_backprop_data.opaque(), output_c_desc.handle(), output_c_backprop_data.opaque(), rnn_desc.params_handle(), params.opaque(), input_h_desc.handle(), input_h_data.opaque(), input_c_desc.handle(), input_c_data.opaque(), input_desc.handles(), input_backprop_data->opaque(), input_h_desc.handle(), input_h_backprop_data->opaque(), input_c_desc.handle(), input_c_backprop_data->opaque(), workspace.opaque(), workspace.size(), reserve_space_data->opaque(), reserve_space_data->size())'
2020-05-08 14:39:02.141642: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at cudnn_rnn_ops.cc:1922 : Internal: Failed to call ThenRnnBackward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 16, 8, 1, 22, 2108, 8]
2020-05-08 14:39:02.141037: F tensorflow/stream_executor/cuda/cuda_dnn.cc:189] Check failed: status == CUDNN_STATUS_SUCCESS (7 vs. 0)Failed to set cuDNN stream.
20
Process finished with exit code -1073740791 (0xC0000409)
Here is some code, which should be able to ran and produced the above output:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# from os import environ
# environ['TF_CPP_MIN_LOG_LEVEL'] = '1'
from tensorflow.keras.models import *
from tensorflow.keras.layers import *
from tensorflow.keras.optimizers import *
import tensorflow as tf
import numpy as np
import sys
def build_model_simple(inputLength=1, outputLength=1, lr=0.0001, device="/gpu:0",
dropoutRate=0.4,
nNeuFirstDense=64,
numLSTM=2, nNeuLSTM=8,
numConv=4, nFiltConv=20, szConvKernel=4,
numDenseInner=1, nNeuDenseInner=128):
tf.keras.backend.set_floatx('float32')
with tf.device(device):
input = Input(shape=(inputLength,), dtype=tf.float32)
inputExp = tf.expand_dims(input, -1)
allInner = []
for _ in range(outputLength):
inner = Dense(nNeuFirstDense, activation="linear")(inputExp)
inner = GaussianDropout(rate=dropoutRate)(inner)
if numLSTM and nNeuLSTM:
for _ in range(numLSTM):
inner = (Bidirectional(LSTM(nNeuLSTM, return_sequences=True))(inner))
if numConv:
for _ in range(numConv):
inner = Conv1D(filters=nFiltConv, kernel_size=szConvKernel,
strides=1, padding='valid',
data_format='channels_first')(inner)
inner = BatchNormalization()(inner)
if numDenseInner:
for _ in range(numDenseInner):
inner = Dense(nNeuDenseInner, activation="linear")(inner)
inner = GaussianDropout(rate=dropoutRate)(inner)
inner = Flatten()(inner)
inner = Dense(1, activation="linear")(inner)
allInner.append(inner)
out = Concatenate()(allInner)
# out = outTmp * outTmp * outTmp
model = Model(inputs=input, outputs=out)
model.compile(loss="mse", optimizer=Adam(lr=lr))
# model.compile(loss="mse", optimizer=Adadelta())
return model, 'ForkedConvLSTM_D{}_LSTM{}x{}_Conv{}x{}x{}_D{}x{}_dr{:.2f}'.format(
nNeuFirstDense,
numLSTM, nNeuLSTM,
numConv, nFiltConv, szConvKernel,
numDenseInner, nNeuDenseInner,
dropoutRate)
def scheduler(epoch, lrStart, lrEnd, lrDecay=0.05, lrNStable=10):
lr = lrStart
if epoch > lrNStable:
fac = tf.math.exp(lrDecay * (lrNStable - epoch))
lr = lrStart * fac + lrEnd * (1 - fac)
return lr
if __name__ == '__main__':
numFeatures = 22
numOutputs = 6
trainIn = np.random.rand(500000, numFeatures)
trainOut = np.random.rand(500000, numOutputs)
valiIn = np.random.rand(12000, numFeatures)
valiOut = np.random.rand(12000, numOutputs)
numDataElements = trainIn.shape[0] * (trainIn.shape[1] + trainOut.shape[1])
sizeCalc = numDataElements * sys.getsizeof(trainIn[0][0])
EPOCHS = 10000
LEARNING_RATE_START = 0.01
LEARNING_RATE_END = 0.00005
LEARNING_DECAY = 0.05
print("Starting training sweep with Epochs: {}, LRstart: {}, LRend: {}".format(
EPOCHS, LEARNING_RATE_START, LEARNING_RATE_END))
network, nwName = build_model_simple(inputLength=numFeatures, outputLength=numOutputs)
netWeights = network.get_weights()
numNetPrams = np.sum([np.prod(ele.shape) for ele in netWeights])
# Estimation of Batch Size: GRAM * RAM Factor / NumParams in Net = ~75k. This divided by 30 for to get a
# good rough estimate for the batch size
BATCH_SIZE = int(np.floor(8 * 1e9 * 0.9 / numNetPrams / 35))
network.summary()
print("***** Training Net {} now *****".format(nwName))
print("BatchSize: {}, NumNetParams: {}, Feature shape: {}, Output shape: "
"{}, In/Out Elem.: {:.4f}M with est. size: {:.4f} MB".format(
BATCH_SIZE, numNetPrams, trainIn.shape, trainOut.shape,
numDataElements / 1e6, sizeCalc / 1e6))
callback = tf.keras.callbacks.LearningRateScheduler(
lambda x: scheduler(x, LEARNING_RATE_START, LEARNING_RATE_END, LEARNING_DECAY))
fitRes = network.fit(trainIn, trainOut, batch_size=BATCH_SIZE, epochs=EPOCHS,
validation_data=(valiIn, valiOut),
callbacks=[callback, tf.keras.callbacks.TerminateOnNaN()],
verbose=1)
logging.info("FINISHED")
For those who come after me:
I played a lot around with different versions. I even tried to get CUDA 10.2 to work by symlinking the new dlls with the old names. But even this did not fix the bug.
I finally managed to get it to work, by removing all NVidia stuff (including drivers) and installing the newest 10.1 release (from end of '19) with the studio drivers from this release. So Version 431.86, instead of the latest studio release 441.66.
I don't think that the previos installations had an error, therefore my estimate is that the driver version was the problem all the time...