I'm new to ML and Kaggle. I was going through the solution of a Kaggle Challenge.
Challenge: https://www.kaggle.com/c/trackml-particle-identification
Solution: https://www.kaggle.com/outrunner/trackml-2-solution-example
While going through the code, I noticed that the author has used only train_1 file (not train_2, 3, …).
I know there is some strategy involved behind using only the train_1 file. Can someone, please, explain why is it so? Also, what are the use of blacklist_training.zip, train_sample.zip, and detectors.zip files?
I'm one of the organiser of the challenge. train_1 2 3 .. files are all equivalent. Outrunner has probably seen there was no improvement using more data.
train_sample.zip is a small dataset equivalent to train_1 2 3... provided for convenience.
blacklist_training.zip is a list of particles to be ignored due to a small bug in the simulator (not very important).
detectors.zip is the list of the geometrical surfaces where the x y z measurements are made.
David