I have installed Gym Retro Mario Game. I am running in interactive mode and can see the award is getting printed on every move I make manually. Want to know how this award is getting calculated. If someone can point to any py file (location, line no) that would be great.
Gone through previous similar questions and gone through the code retro_env.py. Could not find the code of Mario Step and Reward
When I backtracked, I reached file retro_env.py. Below is the step function which should return the award:
def step(self, a):
if self.img is None and self.ram is None:
raise RuntimeError('Please call env.reset() before env.step()')
for p, ap in enumerate(self.action_to_array(a)):
if self.movie:
for i in range(self.num_buttons):
self.movie.set_key(i, ap[i], p)
self.em.set_button_mask(ap, p)
if self.movie:
self.movie.step()
self.em.step()
self.data.update_ram()
ob = self._update_obs()
rew, done, info = self.compute_step()
return ob, rew, bool(done), dict(info)
However it is calling self.compute_step(), which is:
def compute_step(self):
if self.players > 1:
reward = [self.data.current_reward(p) for p in range(self.players)]
else:
reward = self.data.current_reward()
done = self.data.is_done()
return reward, done, self.data.lookup_all()
This function calls current_reward() of GameDataGlue under retro._retro . However, there is no _retro folder in site-packages. Not sure how current_reward is getting calculated
I should be able to understand how mario reward is getting calculated. Then I would be able to apply to other games or even my own custom environment
I figured out the answer. scenario.json in Lib\site-packages\retro\data\stable\SuperMarioBros-Nes contain the reward calculations. For example, origional entries were:
"reward": {
"variables": {
"xscrollLo": {
"reward": 1
}
}
So when mario was moving to right the reward score was getting updated, but taking coins score was not updated.
I made it took like this:
"reward": {
"variables": {
"xscrollLo": {
"reward": 2
},
"coins": {
"reward": 1
}
}
Now my score started increasing when I started taking coins. Sample output below:
steps=6720 episode_steps=6720 episode_returns_delta=80.0 episode_returns=3959.0
steps=6780 episode_steps=6780 episode_returns_delta=1.0 episode_returns=3960.0
steps=6840 episode_steps=6840 episode_returns_delta=1.0 episode_returns=3961.0
Every 1 point increasing here, because I was taking 1 coin in the step.
(Though if someone can give the code of _retro.pyd that would be great)