Has anyone created a successful algorithm for determining good heuristic functions based on training data? Is it even possible?
If you're talking about training data, I assume you're thinking of, for instance, Machine Learning algorithms?
If you want the A* algorithm to be guaranteed to give you an optimal solution if one exists, you must use an admissible heuristic function. This is a function that never overestimates the distance between two nodes.
I'll assume the training data you're thinking of looks like a big table, where each row has a pair of nodes, and each row is labelled with the true distance. Then I assume you're thinking of training a Machine Learning algorithm to estimate the distance between pairs of nodes, and use that estimate of the distance as the heuristic. This is definitely possible, and I think may give decent results even in some cases, but I don't think it's likely that you're going to be able to guarantee that such a heuristic is still admissible. So, using a technique like this would likely result in losing the theoretical guarantee of finding optimal solutions. It may still be useful in practice though, for finding (not necessarily optimal) solutions.