Multiple-series training input is giving NaN loss while same data but One-serie training input is not

I want to train a N-Beats time series model using Darts. I have a time serie DataFrame for each users so I want to use Multiple-Series training but when I feed the list of TimeSeries I directly get NaN as losses during training. If I concatenate all users's TimeSeries into one, I get a normal loss. In both cases the data is scale, fill and cast to float.32

data = scaler.transform(filler.transform(data)).astype(np.float32)

Here is the code that I use combine the list of TimeSeries into a single TimeSeries. I also have a pure Darts code for that but it is much slower for the same result.

SPLIT = 0.8

if concatenate_to_one_ts:
    all_dfs = []
    all_dfs_cov = []

    for i in range(len(list_of_target_ts)):
        all_dfs.append(list_of_target_ts[i].pd_series())
        all_dfs_cov.append(list_of_cov_ts[i].pd_dataframe())
        
    all_dfs = pd.concat(all_dfs)
    all_dfs_cov = pd.concat(all_dfs_cov)
    
    nbr_train_sample = int(len(all_dfs) * SPLIT)

    all_dfs_train = all_dfs[:nbr_train_sample]
    all_dfs_test = all_dfs[nbr_train_sample:]
    
    list_of_target_ts_train = TimeSeries.from_series(all_dfs_train.reset_index(drop=True))
    list_of_target_ts_test = TimeSeries.from_series(all_dfs_test.reset_index(drop=True))
    
    all_dfs_cov_train = all_dfs_cov[:nbr_train_sample]
    all_dfs_cov_test = all_dfs_cov[nbr_train_sample:]
    
    list_of_cov_ts_train = TimeSeries.from_dataframe(all_dfs_cov_train.reset_index(drop=True))
    list_of_cov_ts_test = TimeSeries.from_dataframe(all_dfs_cov_test.reset_index(drop=True))
else:

     nbr_train_sample = int(len(list_of_target_ts) * SPLIT)
     list_of_target_ts_train = list_of_target_ts[:nbr_train_sample]
     list_of_target_ts_test = list_of_target_ts[nbr_train_sample:]
     
     list_of_cov_ts_train = list_of_cov_ts[:nbr_train_sample]
     list_of_cov_ts_test = list_of_cov_ts[nbr_train_sample:]

model = NBEATSModel(input_chunk_length=4,
                    output_chunk_length=1,
                    batch_size=512,
                    n_epochs=5,
                    nr_epochs_val_period=1, 
                    model_name="NBEATS_test",
                    generic_architecture=True,
                    force_reset=True,
                    save_checkpoints=True,
                    show_warnings=True,
                    log_tensorboard=True, 
                    torch_device_str='cuda:0'
                   )

model.fit(series=list_of_target_ts_train, 
          past_covariates=list_of_cov_ts_train, 
          val_series=list_of_target_ts_val, 
          val_past_covariates=list_of_cov_ts_val, 
          verbose=True,
          num_loader_workers=20)

As Multiple-Series training I get: Epoch 0: 8%|██████████▉ | 2250/27807 [03:00<34:11, 12.46it/s, loss=nan, v_num=logs, train_loss=nan.0

As a single serie training I get: Epoch 0: 24%|█████████████████████████▋ | 669/2783 [01:04<03:24, 10.33it/s, loss=0.00758, v_num=logs, train_loss=0.00875]

I am also confused by the number of sample per epoch with the same batch size as from what I read here: https://unit8.com/resources/training-forecasting-models/ the single serie should have more sample as the window size cut is not happening for each Multiple Series.

Solution

Regarding the NaNs, I would try reducing the learning rate if I were you. Also double check that there's no NaN remaining in your data (see corresponding entry here 1)
Regarding the number of samples, each of the separate time series are split into several (input, output) slices. For the single series, this split is done once overall, whereas for the multiple series, this split is done once per series and then all the resulting samples are regrouped in a common training set. So it is expected to have more training samples with multiple series (and each training sample will have fewer dimensions compared to the single-multivariate-series case).