Search code examples
javaneural-networkencog

Self learning neural network with Encog


Can a neural network(BP FF) self-learn(autonomously) how to control a propeller to avoid falling and stabilizing by trials?

Neural network type: multi-layered, back-propagation feed-forward, sigmoid activation.

For simplicity, only vertical propeller control and vertical speed is taken.

İnput: vertical speed.

Output: propeller power(aimed only downwards so pushes only upwards)

Since a back-propagation NN gets error from difference of output and desired output, how can it teach itself without knowing the desired output(actually, it needs to learn desired output).

If I take error as vertical speed(stopped=no error), then it would be more suitable but how can I change the error function of Encog's ResilientPropagation or BackPropagation classes?

Do I need to write whole network class myself to achieve this type of learning? Because there is no initial training data. There is only newly created data from engine power and velocity.(If I can generate training data, then I would know how to control the engine so no NN is needed)

What is the most fitting neural network type for this job?


Solution

  • As noted in the comment by @larsmans this can be solved by Reinforcement Learning paradigm. In the context of neural networks currently the most popular (and only?) approach is to use two neural networks:

    • actor network: which learns what action (propeller power in this case) the agent is ought to take in a given state (vertical speed in this case)

    • critic network: which learns values, in the terms of future reinforcement agent can "hope" to achieve from this state

    This approach is known as Actor-Critic methods. All you need to do additionally is to design the reinforcement function. In your case it seems quite simple, as it could be equal to the vertical velocity with additional penalty for deviating from some predefined height (otherwise the networks will learn just to wait a while till the propeller falls and stops for itself).

    The main issue will be tuning all parameters for all of this to work correctly, however the problem seems very simple so it maybe not be very hard.