CuDNN crash in TF 2.x after many epochs of training

I'm currently becoming more and more desperate concerning my tensorflow project. It took many hours installing tensorflow until I figured out that PyCharm, Python 3.7 and TF 2.x are somehow not compatible. Now it is running, but I get a really unspecific CuDNN error after many epochs of training. Do you know if my code is wrong or if there is e.g. an installation error? Could you please hint me a direction? I also didn't find anything specific with searching.

My setup [in brackets what I also tried]:

HW: i7-4790K, 32 GB RAM and GeForce 2070 Super 8GB
OS: Windows 10 64bit
Python: 3.6.8 [and 3.7 (where tf failed to install)]
IDE: PyCharm 2020.1.1 [and 2020.1]
Driver: Latest "Studio" driver 442.92 [and also latest "gaming" driver]
CuDA: 10.1 + latest CuDNN dlls for this version [I also tried 10.2, but tf doesn't detect it]
TF: 2.2.0 RC4 [, 2.0.x and 2.1.5] All packages installed via PyCharm (and therefore pip)

This error occurs after ~3h of training. In other cases (or parametrisations of the net) the error occurs much earlier. Here you can see the full output of the code sniplet below:

C:\Users\Fhnx\.virtualenvs\Processing-TA9ofq3q\Scripts\python.exe C:/Users/Fhnx/.../playground/AI_Predictor_Test.py
2020-05-08 11:47:25.924424: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
Starting training sweep with Epochs: 10000, LRstart: 0.01, LRend: 5e-05
2020-05-08 11:47:27.887135: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-05-08 11:47:27.912998: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2070 SUPER computeCapability: 7.5
coreClock: 1.815GHz coreCount: 40 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2020-05-08 11:47:27.913212: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-05-08 11:47:27.921203: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-05-08 11:47:27.930115: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-05-08 11:47:27.932760: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-05-08 11:47:27.944938: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-05-08 11:47:27.952321: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-05-08 11:47:27.960042: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-05-08 11:47:27.960698: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-05-08 11:47:27.961058: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-05-08 11:47:27.969636: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2df4e1dcd00 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-05-08 11:47:27.969831: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-05-08 11:47:27.970579: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2070 SUPER computeCapability: 7.5
coreClock: 1.815GHz coreCount: 40 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2020-05-08 11:47:27.970964: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-05-08 11:47:27.971208: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-05-08 11:47:27.971389: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-05-08 11:47:27.971602: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-05-08 11:47:27.971839: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-05-08 11:47:27.972112: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-05-08 11:47:27.972324: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-05-08 11:47:27.973322: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-05-08 11:47:28.530960: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-05-08 11:47:28.531109: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0
2020-05-08 11:47:28.531180: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N
2020-05-08 11:47:28.532337: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6213 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070 SUPER, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-05-08 11:47:28.534819: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2df7aeb31a0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-05-08 11:47:28.534946: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce RTX 2070 SUPER, Compute Capability 7.5
Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to
==================================================================================================
input_1 (InputLayer)            [(None, 22)]         0
__________________________________________________________________________________________________
tf_op_layer_ExpandDims (TensorF [(None, 22, 1)]      0           input_1[0][0]
__________________________________________________________________________________________________
dense (Dense)                   (None, 22, 64)       128         tf_op_layer_ExpandDims[0][0]
__________________________________________________________________________________________________
dense_3 (Dense)                 (None, 22, 64)       128         tf_op_layer_ExpandDims[0][0]
__________________________________________________________________________________________________
dense_6 (Dense)                 (None, 22, 64)       128         tf_op_layer_ExpandDims[0][0]
__________________________________________________________________________________________________
dense_9 (Dense)                 (None, 22, 64)       128         tf_op_layer_ExpandDims[0][0]
__________________________________________________________________________________________________
dense_12 (Dense)                (None, 22, 64)       128         tf_op_layer_ExpandDims[0][0]
__________________________________________________________________________________________________
dense_15 (Dense)                (None, 22, 64)       128         tf_op_layer_ExpandDims[0][0]
__________________________________________________________________________________________________
gaussian_dropout (GaussianDropo (None, 22, 64)       0           dense[0][0]
__________________________________________________________________________________________________
gaussian_dropout_2 (GaussianDro (None, 22, 64)       0           dense_3[0][0]
__________________________________________________________________________________________________
gaussian_dropout_4 (GaussianDro (None, 22, 64)       0           dense_6[0][0]
__________________________________________________________________________________________________
gaussian_dropout_6 (GaussianDro (None, 22, 64)       0           dense_9[0][0]
__________________________________________________________________________________________________
gaussian_dropout_8 (GaussianDro (None, 22, 64)       0           dense_12[0][0]
__________________________________________________________________________________________________
gaussian_dropout_10 (GaussianDr (None, 22, 64)       0           dense_15[0][0]
__________________________________________________________________________________________________
bidirectional (Bidirectional)   (None, 22, 16)       4672        gaussian_dropout[0][0]
__________________________________________________________________________________________________
bidirectional_2 (Bidirectional) (None, 22, 16)       4672        gaussian_dropout_2[0][0]
__________________________________________________________________________________________________
bidirectional_4 (Bidirectional) (None, 22, 16)       4672        gaussian_dropout_4[0][0]
__________________________________________________________________________________________________
bidirectional_6 (Bidirectional) (None, 22, 16)       4672        gaussian_dropout_6[0][0]
__________________________________________________________________________________________________
bidirectional_8 (Bidirectional) (None, 22, 16)       4672        gaussian_dropout_8[0][0]
__________________________________________________________________________________________________
bidirectional_10 (Bidirectional (None, 22, 16)       4672        gaussian_dropout_10[0][0]
__________________________________________________________________________________________________
bidirectional_1 (Bidirectional) (None, 22, 16)       1600        bidirectional[0][0]
__________________________________________________________________________________________________
bidirectional_3 (Bidirectional) (None, 22, 16)       1600        bidirectional_2[0][0]
__________________________________________________________________________________________________
bidirectional_5 (Bidirectional) (None, 22, 16)       1600        bidirectional_4[0][0]
__________________________________________________________________________________________________
bidirectional_7 (Bidirectional) (None, 22, 16)       1600        bidirectional_6[0][0]
__________________________________________________________________________________________________
bidirectional_9 (Bidirectional) (None, 22, 16)       1600        bidirectional_8[0][0]
__________________________________________________________________________________________________
bidirectional_11 (Bidirectional (None, 22, 16)       1600        bidirectional_10[0][0]
__________________________________________________________________________________________________
conv1d (Conv1D)                 (None, 20, 13)       1780        bidirectional_1[0][0]
__________________________________________________________________________________________________
conv1d_4 (Conv1D)               (None, 20, 13)       1780        bidirectional_3[0][0]
__________________________________________________________________________________________________
conv1d_8 (Conv1D)               (None, 20, 13)       1780        bidirectional_5[0][0]
__________________________________________________________________________________________________
conv1d_12 (Conv1D)              (None, 20, 13)       1780        bidirectional_7[0][0]
__________________________________________________________________________________________________
conv1d_16 (Conv1D)              (None, 20, 13)       1780        bidirectional_9[0][0]
__________________________________________________________________________________________________
conv1d_20 (Conv1D)              (None, 20, 13)       1780        bidirectional_11[0][0]
__________________________________________________________________________________________________
conv1d_1 (Conv1D)               (None, 20, 10)       1620        conv1d[0][0]
__________________________________________________________________________________________________
conv1d_5 (Conv1D)               (None, 20, 10)       1620        conv1d_4[0][0]
__________________________________________________________________________________________________
conv1d_9 (Conv1D)               (None, 20, 10)       1620        conv1d_8[0][0]
__________________________________________________________________________________________________
conv1d_13 (Conv1D)              (None, 20, 10)       1620        conv1d_12[0][0]
__________________________________________________________________________________________________
conv1d_17 (Conv1D)              (None, 20, 10)       1620        conv1d_16[0][0]
__________________________________________________________________________________________________
conv1d_21 (Conv1D)              (None, 20, 10)       1620        conv1d_20[0][0]
__________________________________________________________________________________________________
conv1d_2 (Conv1D)               (None, 20, 7)        1620        conv1d_1[0][0]
__________________________________________________________________________________________________
conv1d_6 (Conv1D)               (None, 20, 7)        1620        conv1d_5[0][0]
__________________________________________________________________________________________________
conv1d_10 (Conv1D)              (None, 20, 7)        1620        conv1d_9[0][0]
__________________________________________________________________________________________________
conv1d_14 (Conv1D)              (None, 20, 7)        1620        conv1d_13[0][0]
__________________________________________________________________________________________________
conv1d_18 (Conv1D)              (None, 20, 7)        1620        conv1d_17[0][0]
__________________________________________________________________________________________________
conv1d_22 (Conv1D)              (None, 20, 7)        1620        conv1d_21[0][0]
__________________________________________________________________________________________________
conv1d_3 (Conv1D)               (None, 20, 4)        1620        conv1d_2[0][0]
__________________________________________________________________________________________________
conv1d_7 (Conv1D)               (None, 20, 4)        1620        conv1d_6[0][0]
__________________________________________________________________________________________________
conv1d_11 (Conv1D)              (None, 20, 4)        1620        conv1d_10[0][0]
__________________________________________________________________________________________________
conv1d_15 (Conv1D)              (None, 20, 4)        1620        conv1d_14[0][0]
__________________________________________________________________________________________________
conv1d_19 (Conv1D)              (None, 20, 4)        1620        conv1d_18[0][0]
__________________________________________________________________________________________________
conv1d_23 (Conv1D)              (None, 20, 4)        1620        conv1d_22[0][0]
__________________________________________________________________________________________________
batch_normalization (BatchNorma (None, 20, 4)        16          conv1d_3[0][0]
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 20, 4)        16          conv1d_7[0][0]
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, 20, 4)        16          conv1d_11[0][0]
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, 20, 4)        16          conv1d_15[0][0]
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, 20, 4)        16          conv1d_19[0][0]
__________________________________________________________________________________________________
batch_normalization_5 (BatchNor (None, 20, 4)        16          conv1d_23[0][0]
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 20, 128)      640         batch_normalization[0][0]
__________________________________________________________________________________________________
dense_4 (Dense)                 (None, 20, 128)      640         batch_normalization_1[0][0]
__________________________________________________________________________________________________
dense_7 (Dense)                 (None, 20, 128)      640         batch_normalization_2[0][0]
__________________________________________________________________________________________________
dense_10 (Dense)                (None, 20, 128)      640         batch_normalization_3[0][0]
__________________________________________________________________________________________________
dense_13 (Dense)                (None, 20, 128)      640         batch_normalization_4[0][0]
__________________________________________________________________________________________________
dense_16 (Dense)                (None, 20, 128)      640         batch_normalization_5[0][0]
__________________________________________________________________________________________________
gaussian_dropout_1 (GaussianDro (None, 20, 128)      0           dense_1[0][0]
__________________________________________________________________________________________________
gaussian_dropout_3 (GaussianDro (None, 20, 128)      0           dense_4[0][0]
__________________________________________________________________________________________________
gaussian_dropout_5 (GaussianDro (None, 20, 128)      0           dense_7[0][0]
__________________________________________________________________________________________________
gaussian_dropout_7 (GaussianDro (None, 20, 128)      0           dense_10[0][0]
__________________________________________________________________________________________________
gaussian_dropout_9 (GaussianDro (None, 20, 128)      0           dense_13[0][0]
__________________________________________________________________________________________________
gaussian_dropout_11 (GaussianDr (None, 20, 128)      0           dense_16[0][0]
__________________________________________________________________________________________________
flatten (Flatten)               (None, 2560)         0           gaussian_dropout_1[0][0]
__________________________________________________________________________________________________
flatten_1 (Flatten)             (None, 2560)         0           gaussian_dropout_3[0][0]
__________________________________________________________________________________________________
flatten_2 (Flatten)             (None, 2560)         0           gaussian_dropout_5[0][0]
__________________________________________________________________________________________________
flatten_3 (Flatten)             (None, 2560)         0           gaussian_dropout_7[0][0]
__________________________________________________________________________________________________
flatten_4 (Flatten)             (None, 2560)         0           gaussian_dropout_9[0][0]
__________________________________________________________________________________________________
flatten_5 (Flatten)             (None, 2560)         0           gaussian_dropout_11[0][0]
__________________________________________________________________________________________________
dense_2 (Dense)                 (None, 1)            2561        flatten[0][0]
__________________________________________________________________________________________________
dense_5 (Dense)                 (None, 1)            2561        flatten_1[0][0]
__________________________________________________________________________________________________
dense_8 (Dense)                 (None, 1)            2561        flatten_2[0][0]
__________________________________________________________________________________________________
dense_11 (Dense)                (None, 1)            2561        flatten_3[0][0]
__________________________________________________________________________________________________
dense_14 (Dense)                (None, 1)            2561        flatten_4[0][0]
__________________________________________________________________________________________________
dense_17 (Dense)                (None, 1)            2561        flatten_5[0][0]
__________________________________________________________________________________________________
concatenate (Concatenate)       (None, 6)            0           dense_2[0][0]
                                                                 dense_5[0][0]
                                                                 dense_8[0][0]
                                                                 dense_11[0][0]
                                                                 dense_14[0][0]
                                                                 dense_17[0][0]
==================================================================================================
Total params: 97,542
Trainable params: 97,494
Non-trainable params: 48
__________________________________________________________________________________________________
***** Training Net ForkedConvLSTM_D64_LSTM2x8_Conv4x20x4_D1x128_dr0.40 now *****
BatchSize: 2108, NumNetParams: 97542, Feature shape: (500000, 22), Output shape: (500000, 6), In/Out Elem.: 14.0000M with est. size: 448.0000 MB
Epoch 1/10000
2020-05-08 11:47:57.675309: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-05-08 11:47:57.962354: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-05-08 11:47:59.216097: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only
Relying on driver to perform ptx compilation.
Modify $PATH to customize ptxas location.
This message will be only logged once.
238/238 [==============================] - 21s 90ms/step - loss: 0.3145 - val_loss: 0.0846 - lr: 0.0100
Epoch 2/10000
238/238 [==============================] - 15s 62ms/step - loss: 0.0851 - val_loss: 0.0837 - lr: 0.0100
[...]
Epoch 694/10000
238/238 [==============================] - 14s 61ms/step - loss: 0.0833 - val_loss: 0.0836 - lr: 5.0000e-05
Epoch 695/10000
  6/238 [..............................] - ETA: 12s - loss: 0.08302020-05-08 14:39:02.141015: E tensorflow/stream_executor/dnn.cc:613] CUDNN_STATUS_INTERNAL_ERROR
in tensorflow/stream_executor/cuda/cuda_dnn.cc(1986): 'cudnnRNNBackwardData( cudnn.handle(), rnn_desc.handle(), model_dims.max_seq_length, output_desc.handles(), output_data.opaque(), output_desc.handles(), output_backprop_data.opaque(), output_h_desc.handle(), output_h_backprop_data.opaque(), output_c_desc.handle(), output_c_backprop_data.opaque(), rnn_desc.params_handle(), params.opaque(), input_h_desc.handle(), input_h_data.opaque(), input_c_desc.handle(), input_c_data.opaque(), input_desc.handles(), input_backprop_data->opaque(), input_h_desc.handle(), input_h_backprop_data->opaque(), input_c_desc.handle(), input_c_backprop_data->opaque(), workspace.opaque(), workspace.size(), reserve_space_data->opaque(), reserve_space_data->size())'
2020-05-08 14:39:02.141642: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at cudnn_rnn_ops.cc:1922 : Internal: Failed to call ThenRnnBackward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 16, 8, 1, 22, 2108, 8]
2020-05-08 14:39:02.141037: F tensorflow/stream_executor/cuda/cuda_dnn.cc:189] Check failed: status == CUDNN_STATUS_SUCCESS (7 vs. 0)Failed to set cuDNN stream.
20
Process finished with exit code -1073740791 (0xC0000409)

Here is some code, which should be able to ran and produced the above output:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

# from os import environ
# environ['TF_CPP_MIN_LOG_LEVEL'] = '1'

from tensorflow.keras.models import *
from tensorflow.keras.layers import *
from tensorflow.keras.optimizers import *
import tensorflow as tf
import numpy as np
import sys


def build_model_simple(inputLength=1, outputLength=1, lr=0.0001, device="/gpu:0",
                       dropoutRate=0.4,
                       nNeuFirstDense=64,
                       numLSTM=2, nNeuLSTM=8,
                       numConv=4, nFiltConv=20, szConvKernel=4,
                       numDenseInner=1, nNeuDenseInner=128):
    tf.keras.backend.set_floatx('float32')
    with tf.device(device):
        input = Input(shape=(inputLength,), dtype=tf.float32)
        inputExp = tf.expand_dims(input, -1)
        allInner = []
        for _ in range(outputLength):
            inner = Dense(nNeuFirstDense, activation="linear")(inputExp)
            inner = GaussianDropout(rate=dropoutRate)(inner)

            if numLSTM and nNeuLSTM:
                for _ in range(numLSTM):
                    inner = (Bidirectional(LSTM(nNeuLSTM, return_sequences=True))(inner))

            if numConv:
                for _ in range(numConv):
                    inner = Conv1D(filters=nFiltConv, kernel_size=szConvKernel,
                                   strides=1, padding='valid',
                                   data_format='channels_first')(inner)
                inner = BatchNormalization()(inner)

            if numDenseInner:
                for _ in range(numDenseInner):
                    inner = Dense(nNeuDenseInner, activation="linear")(inner)
                    inner = GaussianDropout(rate=dropoutRate)(inner)
            inner = Flatten()(inner)
            inner = Dense(1, activation="linear")(inner)
            allInner.append(inner)
        out = Concatenate()(allInner)
        # out = outTmp * outTmp * outTmp
        model = Model(inputs=input, outputs=out)

        model.compile(loss="mse", optimizer=Adam(lr=lr))
        # model.compile(loss="mse", optimizer=Adadelta())
        return model, 'ForkedConvLSTM_D{}_LSTM{}x{}_Conv{}x{}x{}_D{}x{}_dr{:.2f}'.format(
            nNeuFirstDense,
            numLSTM, nNeuLSTM,
            numConv, nFiltConv, szConvKernel,
            numDenseInner, nNeuDenseInner,
            dropoutRate)


def scheduler(epoch, lrStart, lrEnd, lrDecay=0.05, lrNStable=10):
    lr = lrStart
    if epoch > lrNStable:
        fac = tf.math.exp(lrDecay * (lrNStable - epoch))
        lr = lrStart * fac + lrEnd * (1 - fac)
    return lr


if __name__ == '__main__':
    numFeatures = 22
    numOutputs = 6

    trainIn = np.random.rand(500000, numFeatures)
    trainOut = np.random.rand(500000, numOutputs)
    valiIn = np.random.rand(12000, numFeatures)
    valiOut = np.random.rand(12000, numOutputs)

    numDataElements = trainIn.shape[0] * (trainIn.shape[1] + trainOut.shape[1])
    sizeCalc = numDataElements * sys.getsizeof(trainIn[0][0])

    EPOCHS = 10000
    LEARNING_RATE_START = 0.01
    LEARNING_RATE_END = 0.00005
    LEARNING_DECAY = 0.05

    print("Starting training sweep with Epochs: {}, LRstart: {}, LRend: {}".format(
        EPOCHS, LEARNING_RATE_START, LEARNING_RATE_END))

    network, nwName = build_model_simple(inputLength=numFeatures, outputLength=numOutputs)

    netWeights = network.get_weights()
    numNetPrams = np.sum([np.prod(ele.shape) for ele in netWeights])

    # Estimation of Batch Size: GRAM * RAM Factor / NumParams in Net = ~75k. This divided by 30 for to get a
    # good rough estimate for the batch size
    BATCH_SIZE = int(np.floor(8 * 1e9 * 0.9 / numNetPrams / 35))
    network.summary()

    print("***** Training Net {} now *****".format(nwName))
    print("BatchSize: {}, NumNetParams: {}, Feature shape: {}, Output shape: "
                 "{}, In/Out Elem.: {:.4f}M with est. size: {:.4f} MB".format(
        BATCH_SIZE, numNetPrams, trainIn.shape, trainOut.shape,
        numDataElements / 1e6, sizeCalc / 1e6))

    callback = tf.keras.callbacks.LearningRateScheduler(
        lambda x: scheduler(x, LEARNING_RATE_START, LEARNING_RATE_END, LEARNING_DECAY))
    fitRes = network.fit(trainIn, trainOut, batch_size=BATCH_SIZE, epochs=EPOCHS,
                         validation_data=(valiIn, valiOut),
                         callbacks=[callback, tf.keras.callbacks.TerminateOnNaN()],
                         verbose=1)

    logging.info("FINISHED")

Solution

For those who come after me:

I played a lot around with different versions. I even tried to get CUDA 10.2 to work by symlinking the new dlls with the old names. But even this did not fix the bug.

I finally managed to get it to work, by removing all NVidia stuff (including drivers) and installing the newest 10.1 release (from end of '19) with the studio drivers from this release. So Version 431.86, instead of the latest studio release 441.66.

I don't think that the previos installations had an error, therefore my estimate is that the driver version was the problem all the time...