python matlab machine-learning neural-network reinforcement-learning

Karpathy's code training neural net to play Pong using Policy Gradients

I'm looking at Andrej Karpathy's "Training a Neural Network ATARI Pong agent with Policy Gradients from raw pixels" https://gist.github.com/karpathy/a4166c7fe253700972fcbc77e4ea32c5 . I'm not a Python person so I'm trying to write this code in Matlab. I have 2 questions.

Question 1: I noticed that xs, hs, dlogps, and drs are initialized to [],[],[],[] (line 67) and reset to [],[],[],[] after each episode (line 103). But epx, eph, epdlogp, and epr are neither initialized nor reset. They seem to keep growing forever (lines 99-102). Am I correct? I'm not familiar with the nuances of np.vstack.

Question 2: If I had a game with player movement options up, down, right, and left, how would I need to modify this code to make it work (beside the obvious modification to 4 nodes in the output layer)?

Thanks.

Solution

I think you're imagining nuances of numpy.vstack that it doesn't have. Lines 99-102 of the code you linked to assign the result of the vstack function to the variables concerned. Any previous values of these variables will be replaced.

epx = np.vstack(xs)
eph = np.vstack(hs)
epdlogp = np.vstack(dlogps)
epr = np.vstack(drs)

For the second part of your question I think you need to try something out, and ask a new question showing what you've tried if it doesn't work.