Search code examples
pythontensorflowkerasmodeltraining-data

I can't resolve ValueError: Unable to create dataset (name already exists)


I'm training some data and always got this error in second fold. Here is some of the source code.

gc.collect()
    sss = StratifiedShuffleSplit(n_splits=10, test_size=0.3, random_state=0)
    fold_no = 1
    annealer = LearningRateScheduler(lambda x: 1e-3 * 0.9 ** x)
    callback2 = CustomEarlyStopping(patience=7)#100)                             
    optimizer = keras.optimizers.Adam(learning_rate=1e-4)
    acc_per_fold,loss_per_fold = [],[]
    needTrain=True
    for train_index, test_index in sss.split(X, y):
        # if fold_no > 1:
        clear_session()
        gc.collect()
        model = build_model(
          X.shape,             
          numClass,
          ) 
        model.compile(loss = 'categorical_crossentropy',
              optimizer=optimizer,
              metrics=['accuracy'])        
        nmModel = 'model_overlap_%d_%d_fold%d.h5'%(n_time_steps,step,fold_no)
        print('------------------------------------------------------------------------')
        print(f'Training for fold {fold_no} ...')
        training_generator = BalancedDataGenerator(X[train_index],
                                            y[train_index],                                                   
                                            batch_size=256)                   

        if needTrain:
            history =  model.fit( 
                training_generator,       
                epochs=1000,callbacks=[
                               callback2,
                                annealer
                               ], verbose=1,                           
                validation_data = (X[test_index],y[test_index]),
                )  
            # model.save(nmModel)     
            if os.path.exists(nmModel):
                os.remove(nmModel)
            model.save(nmModel)
       
        model.load_weights(nmModel)
        scores = model.evaluate(X[test_index],y[test_index], verbose=0)
        print(f'Score for fold {fold_no}: {model.metrics_names[0]} of {scores[0]}; {model.metrics_names[1]} of {scores[1]*100}%')
        acc_per_fold.append(scores[1] * 100)
        loss_per_fold.append(scores[0])    
        # Increase fold number
        fold_no = fold_no + 1
        del model
        gc.collect()

the error occur when the program saving the model. here is the error massage:


  File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\spyder_kernels\py3compat.py:356 in compat_exec
    exec(code, globals, locals)

  File d:\tuh3salman\trainmodeloverlapseqbuku_all.py:298
    model.save(nmModel)

  File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\keras\utils\traceback_utils.py:67 in error_handler
    raise e.with_traceback(filtered_tb) from None

  File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\h5py\_hl\group.py:183 in create_dataset
    dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)

  File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\h5py\_hl\dataset.py:163 in make_new_dset
    dset_id = h5d.create(parent.id, name, tid, sid, dcpl=dcpl, dapl=dapl)

  File h5py\_objects.pyx:54 in h5py._objects.with_phil.wrapper

  File h5py\_objects.pyx:55 in h5py._objects.with_phil.wrapper

  File h5py\h5d.pyx:138 in h5py.h5d.create

ValueError: Unable to create dataset (name already exists)

I've tried to downgrade or update some of library, hoping it will help, unfortunately still got the error. I also tried to delete the previous model or move it to another folder. here is some of my library version, maybe it will help to solve this problem: h5py 3.9.0, keras 2.8.0, tensorflow 2.8.0.

I want to solve this error, because I have been searching for the solution for few days but still got the error. It takes 12 hours for one fold to finish, it is wasting time just to find out it will be succeed or not.


Solution

  • The problem is the way you save the model. Instead of using nmModel.save, try to use nmModel.save_weight. It works for me