machine-learning reinforcement-learning temporal-difference

Gradient Temporal Difference Lambda without Function Approximation

In every formalism of GTD(λ) seems to define it in terms of function approximation, using θ and some weight vector w.

I understand that the need for gradient methods widely came from their convergence properties for linear function approximators, but I would like to make use of GTD for the importance sampling.

Is it possible to take advantage of GTD without function approximation? If so, how are the update equations formalized?

Solution

I understand that when you say "without function approximation" you mean representing the value function V as a table. In that case, the tabular representation of V can also be seen as a function approximator.

For example, if we define the approximated value function as:

Then, using a tabular representation, there are as many features as states, and the feature vector for a given state s is zero for all states except s (that it's equal to one), and the parameter vector theta stores the value for each state. Therefore, GTD, as well as others algorithms, can be used without any modification in a tabular way.