unable to mmap 1024 bytes - Cannot allocate memory - even though there is more than enough ram

I'm currently working on a seminar paper on nlp, summarization of sourcecode function documentation. I've therefore created my own dataset with ca. 64000 samples (37453 is the size of the training dataset) and I want to fine tune the BART model. I use for this the package simpletransformers which is based on the huggingface package. My dataset is a pandas dataframe. An example of my dataset:

My code:

train_df = pd.read_csv(train_path, index_col=0)
train_df.rename(columns={'text':'input_text', 'summary':'target_text'}, inplace=True)

# Logging
logging.basicConfig(level=logging.INFO)
transformers_logger = logging.getLogger("transformers")
transformers_logger.setLevel(logging.WARNING)

# Hyperparameters
model_args = Seq2SeqArgs()

model_args.num_train_epochs = 10
# bart-base = 32, bart-large-cnn = 16
model_args.train_batch_size = 16
# model_args.no_save = True
# model_args.evaluate_generated_text = True
model_args.evaluate_during_training = True
model_args.evaluate_during_training_verbose = True

model_args.overwrite_output_dir = True
model_args.save_model_every_epoch = False
model_args.save_eval_checkpoints = False
model_args.save_optimizer_and_scheduler = False
model_args.save_steps = -1
best_model_dir = 'drive/MyDrive/outputs/bart-large-cnn/best_model/'
model_args.best_model_dir = best_model_dir
   
# Initialize model
model = Seq2SeqModel(
    encoder_decoder_type="bart",
    encoder_decoder_name="facebook/bart-base",
    args=model_args,
    use_cuda=True,
)


# Train the model
model.train_model(
    train_df, 
    # eval_data=eval_df, 
    # matches=count_matches,
)

everything is fine so far BUT I get this error when I start the training.

Here the error from a run I did on a colab notebook:

Exception in thread Thread-14:
Traceback (most recent call last):
  File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.7/multiprocessing/pool.py", line 470, in _handle_results
    task = get()
  File "/usr/lib/python3.7/multiprocessing/connection.py", line 251, in recv
    return _ForkingPickler.loads(buf.getbuffer())
  File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/reductions.py", line 287, in rebuild_storage_fd
    storage = cls._new_shared_fd(fd, size)
RuntimeError: unable to mmap 1024 bytes from file <filename not specified>: Cannot allocate memory (12)

One would think that I simply have not enough memory but this were my System Monitor ca. 3 sec. after the error:

and this was the lowest my available or free memory get in the time between starting the training and getting the error:

After a lot of tuning I found out that for some reason every thing works fine when I train the model only with a dataset of the size of max. 21000. I doesn't madder if I train the "base" version or the "large-cnn" version of the BART model. I just depends on size of my dataset. The error always occurs in the "Creating features from dataset file at cache_dir/" time.

So what have I already tried:

I added a lot of swap memory (as you can see in the screenshot of my System Monitor)
reduced the numbers of workers to 1
I increased the hard- as well as the softmax of my systems open files limit (-n) to 86000

I also tried to train the model in a google colab notebook but I had the same issue; if the dataset size gets over ca. 21000 the training fails. Even after I doubled the memory of my colab session but still keeping the datset size just a little bit over the 21000 limit.

Desktop:

transformers 4.6.0

simpletransformers 0.61.4

ubuntu 20.04.2 LTS

After trying to solve this by myself for literally weeks I would me more than happy if anyone of you guys have an idea how I can solve this :)

(I am aware of this post mmap returns can not allocate memory, even though there is enough even though there is enough unfortunately it couldn't solve my problem. My vm.max_map_count is at 860000)

Solution

So I just found a simple workaround. You can just set use_multiprocessing of the model to False:

model_args.use_multiprocessing = False

Now I can run with my whole dataset.