Search code examples
pythonscikit-learntime-seriesdata-stream

In the scikit-multiflow EvaluatePrequential class, what is the difference between the paramter n_wait and batch_size?


My first understanding was that batch_size is basically all needed to first test the model(s) on new incoming data and then train it/them on the new data. So, how does n_wait influence that procedure?

Docs: evaluate_prequential

My first guess would be that n_wait does not change the procedure, but only influences how the metrics are caculated. Would you agree?

Bonus: is there a integrated way to handle variable batch sizes in multiflow?


Solution

  • The batch_size parameter corresponds to the number of data samples passed to the model(s) on each test and train operation.

    As you mention, the n_wait parameter is used to control the amount of data to be considered when evaluating the "current" performance (last n samples). Additionally, it is used to control the refresh rate of the evaluation plot.

    For the bonus question, EvaluatePrequential does not support variable batch sizes. However, available learning methods can handle this scenario.