I have rtx 3090 gpu and i9 12th gen processor. my training is not too large as well and yet the training time is too long. When I begin the training phase it says 24 cores available but limiting to safe limit of only 8 cores. NUMEXPR_MAX_THREADS not set.
In your terminal add the NUMEXPR_MAX_THREADS
to your terminal.
You can do so by writing in your CLI: export NUMEXPR_MAX_THREADS="24"
if you want to use all of them. This will work until you close your terminal. You can add it permanently to your terminal profile (.bash_profile, ~/.zshrc ...)
Regarding slow execution, that depends on your rasa config choices and the number of stories/rules.
Finally, you need to pass the param use_gpu = True
in your config for TedPolicy t make it train TED faster.