Search code examples
pythonunity-game-engineartificial-intelligenceml-agent

Weird results with unity ml agents python api


I am using the 3DBall example environment, but I am getting some really weird results that I don't understand why they are happening. My code so far is just a for range loop that views the reward and fills in the inputs needed with random values. However when I was doing it, never a negative reward was shown, and randomly there would be no decision steps, which would make sense, but shouldn't it just keep on simulating until there is a decision step? Any help would be greatly appreciated as other then the documentation there are little to no recourses out there for this.

env = UnityEnvironment()
env.reset()
behavior_names = env.behavior_specs

for i in range(50):
    arr = []
    behavior_names = env.behavior_specs
    for i in behavior_names:
        print(i)
    DecisionSteps = env.get_steps("3DBall?team=0")
    print(DecisionSteps[0].reward,len(DecisionSteps[0].reward))
    print(DecisionSteps[0].action_mask) #for some reason it returns action mask as false when Decisionsteps[0].reward is empty and is None when not


    for i in range(len(DecisionSteps[0])):
        arr.append([])
        for b in range(2):
            arr[-1].append(random.uniform(-10,10))
    if(len(DecisionSteps[0])!= 0):
        env.set_actions("3DBall?team=0",numpy.array(arr))
        env.step()
    else:
        env.step()
env.close()

Solution

  • I think that your problem is that when the simulation terminates and needs to be reset, the agent does not return a decision_step but rather a terminal_step. This is because the agent has dropped the ball and the reward returned in the terminal_step will be -1.0. I have taken your code and made some changes and now it runs fine (except that you probably want to change so that you don't reset every time one of the agents drops its ball).

    import numpy as np
    import mlagents
    from mlagents_envs.environment import UnityEnvironment
    
    # -----------------
    # This code is used to close an env that might not have been closed before
    try:
        unity_env.close()
    except:
        pass
    # -----------------
    
    env = UnityEnvironment(file_name = None)
    env.reset()
    
    for i in range(1000):
        arr = []
        behavior_names = env.behavior_specs
    
        # Go through all existing behaviors
        for behavior_name in behavior_names:
            decision_steps, terminal_steps = env.get_steps(behavior_name)
    
            for agent_id_terminated in terminal_steps:
                print("Agent " + behavior_name + " has terminated, resetting environment.")
                # This is probably not the desired behaviour, as the other agents are still active. 
                env.reset()
    
            actions = []
            for agent_id_decisions in decision_steps:
                actions.append(np.random.uniform(-1,1,2))
    
            # print(decision_steps[0].reward)
            # print(decision_steps[0].action_mask)
    
            if len(actions) > 0:
                env.set_actions(behavior_name, np.array(actions))
        try:
            env.step()
        except:
            print("Something happend when taking a step in the environment.")
            print("The communicatior has probably terminated, stopping simulation early.")
            break
    env.close()