I'm looking at Andrej Karpathy's "Training a Neural Network ATARI Pong agent with Policy Gradients from raw pixels" https://gist.github.com/karpathy/a4166c7fe253700972fcbc77e4ea32c5 . I'm not a Python person so I'm trying to write this code in Matlab. I have 2 questions.
Question 1: I noticed that xs
, hs
, dlogps
, and drs
are initialized to [],[],[],[]
(line 67) and reset to [],[],[],[]
after each episode (line 103). But epx
, eph
, epdlogp
, and epr
are neither initialized nor reset. They seem to keep growing forever (lines 99-102). Am I correct? I'm not familiar with the nuances of np.vstack
.
Question 2: If I had a game with player movement options up, down, right, and left, how would I need to modify this code to make it work (beside the obvious modification to 4 nodes in the output layer)?
Thanks.
I think you're imagining nuances of numpy.vstack
that it doesn't have. Lines 99-102 of the code you linked to assign the result of the vstack
function to the variables concerned. Any previous values of these variables will be replaced.
epx = np.vstack(xs)
eph = np.vstack(hs)
epdlogp = np.vstack(dlogps)
epr = np.vstack(drs)
For the second part of your question I think you need to try something out, and ask a new question showing what you've tried if it doesn't work.