Why does A* algorithm find the an optimal path without visiting all the nodes?

I understand that if a heuristic is admissible, A* does not to visit every node to find the most optimal path. Looking at visualizations of each algorithm, A* stops as soon as it reaches its goal node. So how are you sure that your path is the most optimal one if you haven't explored all possible paths to the goal node? How does overestimating each cost path ensure the optimal solution?

Solution

Consider Dijkstra algorithm, that is a special case of A* in which the heuristic is always 0. Dijkstra visits nodes in order of shortest distance from the source, and as such the moment it visits the target node, you know it's the shortest path, since all the unvisited nodes are strictly further away, so going through them can't possibly yield a shorter path.

Now in A* you visit nodes in order of their distances from source plus the value of the heuristic. We assume that the value of the heuristic for a given node doesn't exceed the shortest distance from that node to the target. When you visit the target for the first time, let's call the distance from the source to the target at that moment is d_t. Consider any node v you haven't visited yet. You know that the distance from the source to that node d_v plus the value of the heuristic h_v is larger than d_t (because you visit nodes in the order of that value, so all unvisited nodes have that value larger). Also note that the shortest distance from v to t is larger than or equal to h_v by the assumption we made above. From that it follows that the distance from source to target through v will be longer than d_t, so there's no reason to consider that node. Thus d_t is the shortest path from source to target.