Search code examples
pythondeep-learningneural-networkpath-findingtraveling-salesman

Can deep learning be used for general path finding?


A few weeks ago I was introduced to the traveling salesman problem (TSP): I've read about how this problem applies to many real world scenarios such as the drilling of holes in PCB's, in logistics, in the delivery of packages to different houses,...

I wanted to create a python program that solves the problem and prints out the "cities to visit" in order.

I've also read articles that talk about how deep learning (especially deep-q-learning) can be used to solve path finding problems, so I've started experimenting with gymnasium's FrozenLake and torch's neural networks (for deep q learning).

What I am trying to create now is a warehouse-product-pathfinder in python which, given the names of the products (and their locations on the shelves), it calculates the optimal path to reach them. I was thinking about how to implement it when this question struck me: is deep learning really used for path finding in general?

Think about it: in FrozenLake there holes you must avoid, a walkable path and a reward you have to get to; the neural network gets trained on this specific map for many episodes, until it learns from its past experience what actions to avoid or do in specific places (states). But in the TSP case, everything changes quickly: the next time around, the products that the warehouse worker has to load on the forklift will change, the holes on the PCB will have to be drilled in different spots, and the houses the delivery guy has to reach will be different. How can the agent/neural network learn from its past experience if the "cities" change every time? Wouldn't it be better just to use a non-machine-learning algorithm to solve the problem?

I've thought of other problems as well regarding my case, for example: "when the number of products change, how can I change the number of neurons in the input layer of the neural network?" Or: "When a product gets loaded onto the forklift, it has to be removed from the list of products to get to, and since the coordinates of the products are part of the environment state, the neural network will have 2 extra neurons".


Solution

  • Feasibility aside, I myself don't find it worthwhile during a similar dev. It's more like an overkill by me.

    Firstly, TSP or pathfinding in a fully known graph (e.g., your warehouse scenario) are better solved using classical optimization algorithms, as they are more efficient and require no training cost.

    Secondly, as you correctly noted, in TSP-like scenarios where the input (e.g., cities, products, or delivery points) changes frequently, training a deep learning model for one instance may not generalize to other instances unless the problem structure remains consistent. Also, it would be a huge challenge how to represent the states, a poor state representation ruins Reinforcement Learning for sure.

    Moreover, keep in mind evaluation of your model costs too, I doubt whether it outperforms comparing with applying classical schemes individually to each of your task.

    Instead, you may try and use the following classical schemes to address it:

    Greedy Algorithm: Start at the initial location, and repeatedly go to the nearest unvisited product. This is a fast heuristic but may not always give the optimal solution.

    Dynamic Programming: Such as Held-Karp for small numbers of products.

    In fact, you'd better turn to lib like Google OR-Tools to solve TSP efficiently even for larger instances, provided that you have access to them. A well developed wheel should always be your very first choice.