Search code examples
machine-learningartificial-intelligenceagent

Difficulty to specify a long-term view when assigning utilities to local states


I am currently reading Wiley and Woolridge's Introduction to Multi Agent Systems, and I was hoping whether somebody could clarify the following to me. When speaking about utility functions, the authors state:

A utility is a numeric value representing how "good" the state is: the higher the utility, the better.

The task of the agent is then to bring about states that maximize utility - we do not specify to the agent how this is to be done. In this approach, a task specification would simply be a function

u:E -> R 

which associates a real value with every environment state.

Given such a performance measure, we can then define the overall utility of an agent in some particular environment in several different ways. One (pessimistic) way is to define the utility of the agent as the utility of the worst state that might be encountered by the agent; another might be to define the overall utility as the average utility of all states encountered. There is no right or wrong way: the measure depends upon the kind of task you want your agent to carry out.

The main disadvantage of this approach is that it assigns utilities to local states; it is difficult to specify a long-term view when assigning utilities to individual states.

I am having problems understanding the disadvantage and what exactly a local state is. Could somebody clarify this?


Solution

  • I will show you an example here to explain the idea. Hope it helps. For details, see slide.

    The problem:

    This is a classic problem called Tile World.

    • Two-dimensional grid world, in which we have an agent, tiles, obstacles and holes.
    • An agent can move in four directions (up,down,left,right) and if it is located next to a tile, it can push it in the appropriate direction.
    • Holes have to be filled up with tiles by the agent.
    • The aim is to fill all holes with tiles.

    enter image description here

    Environment State

    The state of the environment can be described using below variables:

    • The agent's current position (a_x, a_y)
    • Four tile's present positions (t1_x, t1_y), (t2_x, t2_y), (t3_x, t3_y) , (t4_x, t4_y)

    State Transfer

    Say in the current state, if the agent pushes the tile beneath it down, the system state transfers to the next state, in which every variable stays the same, except the agent's current position and the position of the tile which is being pushed.

    Utility function

    Our utility function can be defined as the percentage of holes being filled, i.e.,

                # of holes filled
       u =  -------------------------
                # of total holes 
    

    It's apparent that:

    • If the agent fills all holes, utility = 1
    • If the agent fills zero holes, utility = 0

    Associating utility function

    Now look at the two states below.

    enter image description here enter image description here

    It's easy to see that:

    • Both states have the same utility value which is 1/3 (because 1 out 3 holes are filled)
    • The left (state s1) is a dead position, in which you are unable to move all tiles into holes
    • The right (state s2) is a good position, in which you have options to move the remaining two tiles into holes.

    So the conclusions are:

    • If you associate the utility function only to a local state, e.g., u(s1) or u(s2), you actually could not tell the difference in terms of utilities. u(s1)=u(s2)=1/3.

    • You need a global or long-term view of the states which can be represented with run, which is a sequence of interleaved environment states and actions the agent takes.

    • You can assign a utility not to individual states, but to runs. Such an approach takes an inherently long term view.

      u: run -> real value

    • In this setup, the agent's optimal strategy is to maximize expected utility, which does not mean it will be the best but on average we can expect it to do best.

      expected utility = sum of ( u(r) x Prob(r) )

    Please refer to the book you mentioned or the corresponding slide for more details.