Search code examples
machine-learningneural-networknormalizationgenetic-algorithmchess

Should I normalize the inputs in my Neural Network?


first some context.

I'm taking on a very abitious project, making a Neural Network capable of playing Chess at a decent level. I might not succeed but I'm doing it mainly to learn how to approach this kind of machine learning.

I've decided I want to train the network using a genetic algorithm to fine tune the weights after different neural nets have fought against each other in a few games of chess.

Each neuron utilizes a hyperbolic tangent(-1, 1) to normalize the data after it has been processed but no normalization just yet to the input before it enters the network.

I've taken some inspiration from the Giraffe chess engine, particularly the inputs.

They are going to look sort of like this:

First layer:

  • number of remaining White Pawns (0-8)

  • number of remaining Black Pawns (0-8)

  • number of remaining White Knights (0-2)

  • number of remaining Black Knights (0-2)

....

Second layer still on the same level as the first:

  • Position of Pawn 1 (probably going with 2 values, x[0-7] and y[0-7])
  • Position of Pawn 2

...

  • Position of Queen 1
  • Position of Queen 2

...

Third layer, again on the same level of the previous two. The data is only going to "crosstalk" after the next layer of abstraction.

  • Values of pieces attacked by Pawn1 (this is going to be in the 0-12 ish range)
  • Values of pieces attacked by Pawn2

...

  • Value of pieces attacked by Bishop1

You get the idea.

If you didn't here's a terrible Paint representation of what I mean:

Neural Net Representation

The question is: should I normalize the input data before it is read by the Neural Network?

I feel like squishing the data might not be such a good idea but I really don't have the competence to make a conclusive call.

I hope someone here can enlighten me on the subject and if you think I should normalize the data, I would like it if you could suggest some ways of doing so.

Thanks!


Solution

  • You shouldn't need to normalize anything inside the network. The point of machine learning is to train weights and bias to learn a non-linear function, in your example it'd be static chess evaluation. Thus, your second Normalized blue vertical bar (near the final output) is unnecessary.

    Note: Hidden layers is a better terminology than abstraction layer, so I'll use it instead.

    The other normalization you have before the hidden layers is optional but recommended. It also depends on what input we're talking about.

    The Giraffe paper writes in page 18:

    "Each slot has normalized x coordinate, normalized y coordinate ..."

    Chess has 64 squares, without normalization the range would be [0,1,....63]. This is very discrete and the range is much higher than the other inputs (more about later). It does make sense to normalize them to something more manageable and comparable to the other inputs. The paper doesn't say how exactly it gets normalized, but I don't see why [0...1] range wouldn't work. It makes sense to normalize chess squares (or coordinate).

    The other inputs such as whether there's a queen on the board is true or false, and thus require no normalization. For example, the Giraffe paper writes in page 18:

    ... whether piece is present or absent ...

    Clearly, you wouldn't normalize it.

    Answer to your question

    • If you represent Piece Count Layer as in Giraffe, you shouldn't need to normalize. But if you prefer a discrete representation in [0..8] (because there could be 9 queens in chess), you might want to normalize.
    • If you represent Piece Position Layer with chess squares, you should normalize just like Giraffe.
    • Giraffe doesn't normalize Piece Attack Defense Layer possibly it represents the information as the lowest-valued attacker and defender of each square. Unfortunately, the paper doesn't explicitly state how this is done. Your implementation might require normalization, so use your common sense.

    Without any prior assumption which features would be more relevant for the model, you should normalize them to a comparable scaling.

    EDITED

    Let me answer your comment. Normalization is not the correct term, what you're talking is activation function (https://en.wikipedia.org/wiki/Activation_function). Normalization and activation function are not the same thing.