python tensorflow keras neural-network tensorflow2.0

tensorflow.keras only runs correctly once

I am using tensorflow.keras in a jupyter notebook to produce a neural network to match some real world data. The first time I run my code, it works correctly. The neural network gives a model to match the real world data. The problem is, if I try to run it again, it can't generate any new models. For some reason, it can't find models that match, even though during the first run, it easily found many. I'm not getting error messages or anything like that. The code runs just like it did before and tries to find neural network fits to the given data, but can't find any, no matter how low I set the success criterion ('NSE_cut' in the code below).

In order to get it to create new models, I have to restart the jupyter notebook kernel, which erases all of the data and processing I've done so far. More problematically, it means I can't do multiple neural network models in the same run. I need to be able to do multiple neural network runs so I can compare the results and see which is best. The input data is processed to varying degrees; e.g., no smoothing, a tiny bit of smoothing, a bit more smoothing, a crapton of smoothing. I need to do these all within the same run without needing to restart the Jupyter notebook.

What am I doing wrong? Why can this code only produce models during its first run?

Here's the code:

### --- Calculate Neural Network fit for modes --- ###
def nn_fit(reof_ds, q, NSE_cut):
    indx_qual_mode = []
    best_model = []
    best_score = []
    best_nse = 0

    # ----- Train Tensorflow Hydro-to-TPC models mode-by-mode -----
    for mode in reof_ds.mode.values:  
        keras.backend.clear_session()    
        print('Building models for mode-'+str(mode).zfill(2))
        tpc = reof_ds.temporal_modes.sel(mode=int(mode))        

        X = q.values.reshape(len(q),1)
        Y = tpc.values.reshape(len(q),1)
    
        X_train = X
        Y_train = Y     
    
        # -- adapt function is to get the mean and STD used to normalize the input data of the model --
        normalizer.adapt(X_train)    
    
        # ----- Construct the model and get summary -----
        model = build_and_compile_model(normalizer)
        #if vis_tf_nn==0:
        #elif vis_tf_nn==1:
        #  plot_model(model, to_file='NOAA_WF\\hydro2rtpc_mdl\\'+data_src+'\\site-'+str(gaugeID_list[site])+'_tpc'+str(mode+1).zfill(2)+'.png', show_shapes=True, show_layer_names=True)
    
         # ----- Fit the model -----
        train_proc = model.fit(
            X_train, 
            Y_train, 
            callbacks=[callback],
            batch_size=32, 
            epochs=200, 
            verbose=0, 
            #validation_split=0.2
        )    

        # ----- Plot model estimation and original scatter plot -----
        X_sim = tf.linspace(np.amin(X), np.amax(X), X.size*10^10)
        Y_sim = model.predict(X_sim)
   

        # ----- The second-time REOF mode screening based on quality of regression models. If qualified, export trained model -----
        Y_mdl = model.predict(X[:,0])
        nse = 1 - ( (np.nansum(np.square( Y - Y_mdl ))) / (np.nansum(np.square( Y - np.nanmean(Y) ))) )

        if nse >= NSE_cut: # Moriasi et al., 2007. Consider NSE>0.5 as satisfactory
            
            model.summary()
            print(mode, nse)

            # ----- Plot training progress -----  
            plot_loss(train_proc)

            best_nse = nse
            best_score.append(best_nse)
            best_model.append(model)
            indx_qual_mode.append(mode)

            fig = plt.figure()
            ax = fig.add_axes([0,0,1,1])
            plt.scatter(X, Y)
            plt.plot(X_sim, Y_sim, color='r')
            plt.xlabel('discharge (m3/d)')
            plt.ylabel(f"mode {mode}")  
            plt.xticks(rotation=45, ha='right')
            plt.yticks(rotation=45)
            plt.legend(['NN-Model','Data'],loc='upper left')
            plt.text(0.2,0.5,'NSE: '+"{:.2f}".format(nse), transform=ax.transAxes)
            plt.savefig('test.png', dpi=300, bbox_inches='tight')
            plt.show()
    
            # ----- Export trained model -----
            #best_model.save('test.keras')  
    # ADDED by Knicely
        del Y_mdl, X_sim, Y_sim, X, Y, X_train, Y_train, train_proc, model
    keras.backend.clear_session()    
    
    return(best_model, indx_qual_mode)

reof_ds contains the spatial and temporal modes from a rotated empirical orthogonal function.

Solution

Thanks to @rehaqds, I figured out the problem!

I needed to set 'patience' to some value. 'patience' determines the number of epochs without improvement that the code will run before quitting. Apparently, mine was weird. By setting that to '10', I was able to get results! Woot woot!

You can do this by switching out 'callback' (which I believe to be a default settings file for tensorflow) with 'early_stopping'. To be clear, doing this means the code will keep searching for a fit unless 10 epochs have passed without improvement. If runtime or processing usage is a concern for you, you'll want to make 'patience' something smaller.

Again, gotta thank @rehaqds for getting me here. Without your questions, I never would've figured this out. Thanks!

Here is the relevant code:

    # ----- Construct the model and get summary -----
    model = build_and_compile_model(normalizer)
    #if vis_tf_nn==0:
    #elif vis_tf_nn==1:
    #  plot_model(model, to_file='NOAA_WF\\hydro2rtpc_mdl\\'+data_src+'\\site-'+str(gaugeID_list[site])+'_tpc'+str(mode+1).zfill(2)+'.png', show_shapes=True, show_layer_names=True)

     # ----- Fit the model -----

    # Added by knicely
    early_stopping = tf.keras.callbacks.EarlyStopping(
        monitor='loss', # metric to be monitored, typically 'loss' for training loss or 'val_loss' for validation loss. 
        patience=10, # number of epochs with no improvement after which training will be stopped
    )

    train_proc = model.fit(
        X_train, 
        Y_train, 
        callbacks=[early_stopping],
        batch_size=32, 
        epochs=200, 
        verbose=0, 
        #validation_split=0.2
    )    
    # End addition by Knicely

    # # Original train_proc!!!!
    # train_proc = model.fit(
    #     X_train, 
    #     Y_train, 
    #     callbacks=[callback],
    #     batch_size=32, 
    #     epochs=200, 
    #     verbose=0, 
    #     #validation_split=0.2
    # )    

    # ----- Plot model estimation and original scatter plot -----
    X_sim = tf.linspace(np.amin(X), np.amax(X), X.size*10^10)
    Y_sim = model.predict(X_sim)