Even though my train, validation and test set are always the same, test accuracy fluctuate. The only reason I can think of is weight initialization. I am using PyTorch and I guess they use advance initialization technique (kaiming initialization).
What might be the reason for accuracy fluctuation even though train, validation and test data are the same?
In addition to weight initialisation, dropout between layers also involves randomness and these can lead to different results on running again.
These random numbers are generally based on a seed value and fixing it can help reproduce the results. You can take a look here on how to fix seed value.