Search code examples
juliamnist

How do I add and use MNIST in Julia 1.6.6?


The code for Mohammad Nauman's excellent book shows this (for Julia 1.5.3):

using Flux, Statistics 
using Flux.Data.MNIST
using Flux: onehotbatch

Which fails under Julai 1.6.6 with

UndefVarError: MNIST not defined

Stacktrace:
 [1] eval
   @ ./boot.jl:360 [inlined]
 [2] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
   @ Base ./loading.jl:1116

So I try

] add MNIST

which gives

The following package names could not be resolved:
 * MNIST (not found in project, manifest or registry)

If I try

using MNIST

it gives

ArgumentError: Package MNIST not found in current path:
- Run `import Pkg; Pkg.add("MNIST")` to install the MNIST package.

If I then try the recommended

import Pkg; Pkg.add("MNIST")

it gives

The following package names could not be resolved:
 * MNIST (not found in project, manifest or registry)

The author's code also gives the same error under 1.6.6.

How can I use MNIST under Julia 1.6.6?


Solution

  • The MNIST dataset is available from the MLDatasets.jl package.

    A lot of information is available in the package documentation: MNIST.

    ]add MLDatasets
    
    using MLDatasets
    
    # load training set
    train_x, train_y = MNIST.traindata()
    
    # load test set
    test_x,  test_y  = MNIST.testdata()
    

    To expand on the above and add some background information. I don't have the book so I can't check exactly what version of Flux is used but it is some version prior to v0.12.0 which is when the datasets were removed (see commit b78cd76) in favor of MLDatasets (relevant PR). Of course having a different Julia version does not prevent you from installing an older version of Flux. I would not recommend opting for an older version of Flux if this is the only issue you're facing. Up to date tutorials will be using MLDatasets and the Julia community in general tends to converge on a single package for a particular purpose.

    To clarify the example above:

    where you would before do:

    train_x = MNIST.images(:train)
    train_y = MNIST.labels(:train)
    
    test_x = MNIST.images(:train)
    test_y = MNIST.labels(:train)
    

    you would now instead use the code above. The labels are identical in the two cases:

    julia> train_x, train_y = MLDatasets.MNIST.traindata();
    
    julia> Data.MNIST.labels(:train) == train_y
    true
    

    However, Flux.Data.MNIST.images(:train) returns a Vector of images (28x28 matrices with eltype Gray{N0f8}) while MLDatasets returns (more or less) a 3D tensor (28x28x60000). To get data identical to the one in Flux.Data.MNIST we need to split up the matrices of the tensor, turn them into images (Gray elements), and transpose them.

    julia> using ImageCore
    julia> map(transpose, eachslice(Gray.(train_x); dims=3)) == Data.MNIST.images(:train)
    true
    

    If you decide that you prefer using an older version of Flux you could try v0.12.2 - v0.12.10. They are compatible with your Julia version and "still" have Flux.Data.MNIST (the datasets were added back but marked as deprecated):

    pkg> add Flux#v0.12.10