python python-3.x machine-learning nlp rasa-nlu

Retraining and updating an existing Rasa NLU model

I've been using Rasa NLU for a project which involves making sense of structured text. My use case requires me to keep updating my training set by adding new examples of text corpus entities. However, this means that I have to keep retraining my model every few days, thereby taking more time for the same owing to increased training set size.

Is there a way in Rasa NLU to update an already trained model by only training it with the new training set data instead of retraining the entire model again using the entire previous training data set and the new training data set?

I'm trying to look for an approach where I can simply update my existing trained model by training it with incremental additional training data set every few days.

Solution

To date, the most recent Github issue on the topic states there is no way to retrain a model adding just the new utterances. Same for previous issues cited therein.

You're right: having to retrain periodically with increasingly long files gets more and more time-consuming. Although, retraining in place is not a good idea in production.

Excellent example in a user comment:

Retraining on the same model can be a problem for production systems. I used to overwrite my models and then at some point, one of the training didn't work perfectly and I started to see a critical drop in my responses confidence. I had to find where the problem was coming from and retrain the model.

Training new model all the time (with a timestamp) is good because it makes rollbacks easier (and they will happen in production systems). I then fetch the up-to-date model names from DB.