Search code examples
data-sciencelogistic-regressioncurverocrandom-seed

Should I try multiple different seed values when using a ROC curve to choose variables?


Say I have two subsets of variables, set A and set B. Set A is producing a much better ROC curve than set B, however, I have just realised that the ROC curve changes when using a different seed. Will set A always produce a better ROC curve than set B or should I be producing multiple ROC curves for each set using different seed values to compare the subsets of variables?


Solution

  • In many cases difference between seeds is neglectable. If you need to compare how well an algorithm is doing on different sets of data you surely need to use the same seed.