deep-learning nan loss-function transformer-model

Informer: loss always Nan

I try to use the infomer model to predict my own dataset.But when I change the training dataset to my dataset.Although the program can run, my loss has always been Nan, and there are no predicted values after the training.

I print train_loss,vali_loss and test_loss.The value of them is all nan.

Epoch: 1, Steps: 739 | Train Loss: nan Vali Loss: nan Test Loss: nan

I looked at the value of test_loss as follows

test_loss = {float32} nan
time_now = {float} 1678378741.1124253
 strides = {tuple: 0} ()
 size = {int} 1
 shape = {tuple: 0} ()
 ndim = {int} 0
 real = {float32} nan
 nbytes = {int} 4
 itemsize = {int} 4
 imag = {float32} 0.0
 flat = {flatiter: 1} <numpy.flatiter object at 0x0000021FA9C06040>
 flags = {flagsobj}   C_CONTIGUOUS : True\n  F_CONTIGUOUS : True\n  OWNDATA : True\n  WRITEABLE : False\n  ALIGNED : True\n  WRITEBACKIFCOPY : False\n
 dtype = {dtype[float32]: 0} float32
 data = {memoryview: 1} <memory at 0x0000021ECC6AFB80>
 base = {NoneType} None
 T = {float32} nan

You can see that a lot of them are Nan And the MSE and Mae that I output when I finish running are also Nan.

This is my loss calculation code，the ‘pred’ is all nan

epoch_time = time.time()
            for i, (batch_x,batch_y,batch_x_mark,batch_y_mark) in enumerate(train_loader):
                iter_count += 1
                
                model_optim.zero_grad()
                pred, true = self._process_one_batch(
                    train_data, batch_x, batch_y, batch_x_mark, batch_y_mark)
                loss = criterion(pred, true)
                train_loss.append(loss.item())
                ...
            print("Epoch: {} cost time: {}".format(epoch+1, time.time()-epoch_time))
            train_loss = np.average(train_loss)
            vali_loss = self.vali(vali_data, vali_loader, criterion)
            test_loss = self.vali(test_data, test_loader, criterion)

What i confuse is, when I used the dataset that the model originally provided, the program worked fine, all the data was fine, and there was no Nan. And it can also predict the outcome.

here is original dataset column

date Visibility DryBulbFarenheit DryBulbCelsius WetBulbFarenheit DewPointFarenheit DewPointCelsius DewPointCelsius RelativeHumidity WindSpeed WindDirection StationPressure Altimeter  WetBulbCelsius(target)

And here is my dataset column

date hight wind_speed wind_direction temperature humidity atmospheric_pressure(target)

I would like to know why the original dataset can be run without error. However, an error occurs when I run my own dataset.where is the problem.Why is my loss always nan and can not predict data.

Solution

Because there is one column in my dataset that has exactly the same value, and when I subtract the mean from the maximum and use it as a denominator, that column is always going to be Nan