Stop a machine learning training but keep current state of the model with Vowpal Wabbit

I am currently running a training phase using vowpal wabbit. The data set is big (4GBytes), and it has already run a whole night long. It is still training, and I don't know how many days it could last.

Do you know if there's a way to stop the training, but keep and save the model in its current state, so has to test it on real data ?

Solution

If you knew that in advance you could use either --save_per_pass (so a model is save after every pass), or if you do just one-pass learning, you can include special examples with tag save_filename where filename is the path where the model should be saved.

If you do multi-pass learning and the first pass ended (so a cache file was created), you have no way to to include the save example in the training data, so I am afraid you have no easy way to save the model trained so far.

I would say 4 GiB is a small dataset:-). When I trained on 10GiB (compressed) dataset (which is also not big), it took two hours (including creation of the cache file, which takes most of the time, further passes/experiments are much faster), without any parallelization. Of course, it depends on the dataset, online vs. batch learning, the reductions and parameters used and especially the number of passes and hard drive speed, but "whole night" seems to me too long for such a small dataset.

As @user3914041 said, check the stderr log.