Search code examples
algorithmoptimizationgraphshortest-patha-star

By how much suboptimal can be path found with A* using heuristic which overestimates the remaining distance a little?


Consider you use A* algorithm in which heuristic can overestimate the remaining distance by a few meters. Can it happen that the final path is several kilometres longer than the really shortest path? Can you give an example of graph in which this happens, what kind of graph is it? A scenario in which Euclidean (straight line) distance can overestimate the remaining distance is:

  • The graph vertices are situated in (x, y) coordinates on a plane, where x and y are floating-point
  • There are arcs of some floating-point lengths between some vertices of the graph. The length of an arc is no smaller than the Euclidean distance between its vertices (but can be greater for bended/non-straight arcs)
  • However, while running A* algorithm you use integer arithmetic with rounding down, while A* estimate is rounded up (this is unreasonable, but just an example of how small the differences are): so you round the length of each arc down to integer number of meters, and you round A* estimate up to integer number of meters

Is there a formula which says the upper bound on suboptimality of the final path given the upper bound on how much A* heuristic overestimates the remaining distance?


Solution

  • A* returns when the partial answer it retrieves (which is the partial answer with smallest estimated total distance to reach the goal) has in fact reached the goal. Standard A* guarantees to find the correct answer because, by the definition of the heuristic, all the estimated total distances to reach the goal are lower bounds, so none of the other answers can do better.

    Suppose the heuristic for an answer which in fact will end up with total distance T can be up to KT, where K > 1. If A* retrieves an answer with cost KT and thinks it has succeeded because the answer reaches the goal, then it might not be the best answer. We know that every partial answer still in the pool has cost at least KT, but because of the heuristic, an answer with heuristic KT might actually turn into an answer with total cost T < KT (but cannot turn into an answer with any cheaper cost). So with this sort of heuristic, you the answer returned by A* can be up to K times as expensive as the best answer, but no more.

    This is actually summarized in the Wikipedia entry at http://en.wikipedia.org/wiki/A_search_algorithm#Bounded_relaxation - using such heuristics deliberately is one way to speed up A at the cost of returning worse answers.