java neural-network processing autoencoder

Autoencoder learns average of all samples

I'm a programming enthusiast learning how to write an autoencoder from scratch. I've already tried making simple neural networks for linear regression problems and non-linear data classification, so I figured this wouldn't be as hard. I got to the point where my autoencoder learns it's best, but the output is average of all inputs like these two:

And here's the output:

If you want to see a video of it training, it's here: https://youtu.be/w8mPVj_lQWI

If I add all other 17 samples (another batch of digits 1 and 2), it becomes a smear, average looking result too:

I designed my network to be 3 layers, with 64 input neurons (input is a 4096-dimensional vector, referring to a 64x64 image sample), 8 neurons in the bottleneck part (second layer), and outputing 4096 neurons, each one for one dimension of the final output.

I'm using tanh as my activation function (except in the last layer, which uses linear activation) and backpropagation as a learning algorythm, calculating partial derivatives from the output layer neurons, back to the input ones.

In the upper-left corner is the input image and in the middle is the output image. All values range from -1 to 1 (because of the tanh activation), where 1 means white, 0 and below means black.

The output image is generated after arround 12k epochs of 2 images, that is at learning rate of 5*10-e6.

One interesting discovery is, that if I increase learning rate to 0.001, the output becomes clearly either a 1 or 2, but in wrong order. Take a look at this video: https://youtu.be/LyugJx8RiJ8

I can try training on a 5 layer neural network, but it does the same thing.

Can you think of any problems the code I've written could have. I didn't use any pre-made libraries, everything from scratch, reading pixels and stuff. Here's my code in Processing if it helps (although it's a lot and a bit messy):

class Nevron{

    public double w[];
    public double a;
    public double z;
    public double b;
    public double der;
    public double derW;
    public double derB;
    public double lr = 0.00001;

    public Nevron(int stVhodov){
      w = new double[stVhodov];
      a = 0;
      z = 0;
      der = 0;
      for(int i = 0; i < w.length; i++){
        w[i] = random(-1, 1);
      }
      b = random(-1, 1);
    }

    public void answer(double []x){
      a = 0;
      z = 0;
      for(int i = 0; i < x.length; i++){
        z = z + x[i]*w[i];
      }
      z += b;
      a = Math.tanh(z);
    }


    public void answer(layer l){
      a = 0;
      z = 0;
      for(int i = 0; i < l.nevron.length; i++){
          z = z + l.nevron[i].a*w[i];
      }
      z += b;
      a = Math.tanh(z);
    }

    public void answerOut(layer l){
      a = 0;
      z = 0;
      for(int i = 0; i < l.nevron.length; i++){
          z = z + l.nevron[i].a*w[i];
      }
      z += b;
      a = z;
    }

    public void changeWeight(layer l){
      for(int i = 0; i < l.nevron.length; i++){
        w[i] = w[i] - der * lr * l.nevron[i].a;
        b = b - der * lr;
      }
      der = 0;
    }

    public void changeWeight(double []x){
      for(int i = 0; i < x.length; i++){
        w[i] = w[i] - der * lr * x[i];
        b = b - der * lr;
      }
      der = 0;
    }

    public double MSE(double odg){
      return (odg-a)*(odg-a);
    }

    public double derOut(double odg, double wl){
      der = 2*(a-odg);
      return 2*(a-odg)* wl;
    }

    public double derHid(double wl){
      return der * (1-Math.pow(Math.tanh(z), 2)) * wl;
    }
  }

  class layer{

    public Nevron nevron[];

    public layer(int stNevronov, int stVhodov){
      nevron = new Nevron[stNevronov];
      for(int i = 0; i < stNevronov; i++){
        nevron[i] = new Nevron(stVhodov);
      }
    }
    public void answer(double []x){
      for(int i = 0; i < nevron.length; i++){
        nevron[i].answer(x);
      }
    }
    public void answer(layer l){
      for(int i = 0; i < nevron.length; i++){
        nevron[i].answer(l);
      }
    }

    public void answerOut(layer l){
      for(int i = 0; i < nevron.length; i++){
        nevron[i].answerOut(l);
      }
    }

    public double[] allanswers(){
      double answerOut[] = new double[nevron.length];
      for(int i = 0; i < nevron.length; i++){
        answerOut[i] = nevron[i].a;
      }
      return answerOut;
    }

  }

  class Perceptron{

    public layer layer[];
    public double mse = 0;

    public Perceptron(int stVhodov, int []layeri){
      layer = new layer[layeri.length];
      layer[0] = new layer(layeri[0], stVhodov);
      for(int i = 1; i < layeri.length; i++){
        layer[i] = new layer(layeri[i], layeri[i-1]);
      }
    }
    public double [] answer(double []x){
      layer[0].answer(x);
      for(int i = 1; i < layer.length-1; i++){
        layer[i].answer(layer[i-1]);
      }
      layer[layer.length-1].answerOut(layer[layer.length-2]);
      return layer[layer.length-1].allanswers();
    }

    public void backprop(double ans[]){
      mse = 0;
      //hid-out calculate derivatives
      for(int i = 0; i < layer[layer.length-1].nevron.length; i++){
        for(int j = 0; j < layer[layer.length-2].nevron.length; j++){
          layer[layer.length-2].nevron[j].der += layer[layer.length-1].nevron[i].derOut(ans[i], layer[layer.length-1].nevron[i].w[j]);
          mse += layer[layer.length-1].nevron[i].MSE(ans[i]);
        }
      }
      //hid - hid && inp - hid calculate derivatives
      //println(mse);
      for(int i = layer.length-2; i > 0; i--){
        for(int j = 0; j < layer[i].nevron.length-1; j++){
          for(int k = 0; k < layer[i-1].nevron.length; k++){
            layer[i-1].nevron[k].der += layer[i].nevron[j].derHid(layer[i].nevron[j].w[k]);
          }
        }
      }
      //hid-out change weights
      for(int i = layer.length-1; i > 0; i--){
        for(int j = 0; j < layer[i].nevron.length; j++){
          layer[i].nevron[j].changeWeight(layer[i-1]);
        }
      }
      //hid-out change weights
      for(int i = 0; i < layer[0].nevron.length; i++){
        layer[0].nevron[i].changeWeight(ans);
      }

    }

  }

I will be thankful for any help.

Solution

At the end I spent most of my time figuring out the best combination of parameters and found out:

Don't rush, take your time and see how the NN is progressing
Watch the loss, how it descends, if it bounces back and forth, lower learn rate or try running the NN again (because of local minima)
Start with 2 samples and see, how many neurons do you need in the bottleneck layer. Try using images of circles and squares as training data
It takes NN longer to differentiate similar images
then try taking 3 samples and see the best combination of neurons and layers.

All in all, most of it is luck based (though you still have good chances of getting a good training session on the third try), that is if you're implementing it from scratch. I'm sure there are other methods, that help the NN jump out of local minima, different gradient descents and so on. Here are my final results of an autoencoder (5 layers with 16, 8, 8, 16, 4096 neurons), that can encode faces of Ariana Grande, Tom Cruise and Sabre Norris (source: famousbirthdays.com). The upper images are of course reconstructions my decoder generated.

I also made a simple editor, where you can mess with decoder's inputs and managed to make Stephen Fry's face:

Thanks again for all your help!