tensorflow artificial-intelligence tensorflow2.0 reinforcement-learning

Training an AI algorithm to learn new features

When looking into AI, I only ever see 1 training period and then your model learns and it is perfect. But what if the data doesn't have a true pattern like financial prices, or a playing a game for example. Then your algorithm fails to learn and you are left with nothing.

I did some research into openAI and how they taught ai algos to play Dota 2. One of the programmers said that over the weekend, he taught the algorithm how to block creeps by giving it rewards. Did they take the existing model, added some rewards when the character was standing in front of creeps, and then let it rip and it would all of a sudden learn a new skill?

There is no information about how this is done! It's more of a progressive learning system rather than a 1 time train and done. Please shed some light on this process and how I can train a financial algorithm "features".

Solution

Online vs offline learning

Take a step back and look at machine learning in general to understand the differences between online and offline learning. Artificial intelligence is just a fancy name for a subset of machine learning almost exclusively based on neural networks. What you refer to as "one training period" is called offline learning and what you are looking for is online learning.

In computer science, online machine learning is a method of machine learning in which data becomes available in a sequential order and is used to update the best predictor for future data at each step, as opposed to batch learning techniques which generate the best predictor by learning on the entire training data set at once. [https://en.wikipedia.org/wiki/Online_machine_learning]

The key is to incrementally teach your model with new data without making it forget previous knowledge. A famous toy problem is the non-stationary multi-armed bandit with changing parameters which is a common way of introducing reinforcement learning concepts to students.

Reinforcement learning

You can formulate this problem in an agent-environment model where your model plays the role of an agent choosing from a set of actions (buy/sell) based on the current state of the environment (stock prices) while maximizing a reward function (value of portfolio). The state-of-the-art RL algorithms also use deep learning hence they are classified as artificial intelligence such as openAI's Dota bot.

Take a look at deep reinforcement learning to learn more.