Search code examples
pythonmnist

Load MNIST dataset from scratch and split it in training-validation-test set


There are many guides about loading and splitting MNIST dataset, like this one. They are using libraries such as Keras or Tensorflow.

I would like to load MNIST dataset and splitting in trainig-validation-test set from scratch that is only using built-in python features (and numpy library, if needed).

This is the link to the dataset: MNIST dataset.

  • Can you help me?

Solution

  • You may look at the source code of Tensorflow or Keras to see how they download it without other libraries. Here is the relevant piece of code in PyTorch. It uses this helper code. As far as I can see that code only uses standard libraries. You may reuse their code (BSD-3 Clause License) or read theirs to see what you have to do and then write your own.

    Once the data is on your disk and you can load it, there are several options to create a custom train/validate/test split: Python splitting data into random sets