Search code examples
pythonarraysnumpyreinforcement-learning

Numpy - How to get an array of the pattern gamma^t for some 0-t?


I am creating a basic gridworld RL problem and I need to calculate the return for some given episode. I currently have the array of rewards, and I would like to element-wise multiply this with a list of the form:

[gamma**0, gamma**1, gamma**2, ....]

In order to get:

[r_0*gamma**0, r_1*gamma**1, r_2*gamma**2, ....]

and then use np.sum() to get the entire return.

How can I complete that first step? I tried using Logspace, but it isn't quite what I want (or I'm doing it wrong).


Solution

  • if the example if like this for reward array and gamma is some value:

    n = 20    
    reward = np.random.randint(0, 10, n)
    gamma = 2
    
    np.sum(reward * (gamma ** np.arange(n)))