So I am dealing with graphs and use algorithms like the DFS and BFS to figure out all paths in them to a specific vertex. However, in my examples the DFS was always faster when finding those paths than the BFS.
Do you have any ideas how my graph should look like in order that the BFS runs (significantly) faster? Thanks in advance
When target is quite close, but DFS loose itself exploring lot of "far from start node" nodes.
For example, for a GPS, if you are trying to find a path from Dallas to Tucson, but your algorithm explore first edges that go north, and explore any single road in Canada, from Montreal to Alaska before lastly coming back to Texas.
It is not the usual case, since that would be bad luck. Plus, usually DFS comes with some sort of heuristic (here, first edge would be something going west rather than north). So, usually the tradeoff is "DFS is generally faster to find a path, but does not guaranty that this path is the shortest, while BFS guaranty that, but need to explore all path as short as the solution to find it".
Back to my Southern example, general case would be rather that DFS will find an almost direct, but not optimal path from Dallas to Tucson, without even exploring Houson, Memphis or Kansas City. While BFS will find a guaranteed optimal path, but to be sure it is optimal, it needs to explore any road in a circle centered around Dallas and with radius Dallas-Tucson distance, including any impasse in any suburb of Houston, New Orleans, Denver, Albuquerque, Nashville, Atlanta, and any ditch road in coutry in between, etc.
But DFS is not even guaranteed to not do the same extensive exploration. It is just less likely that it does, especially if it has an heuristic. So, again, in that Dallas->Tucson example, nothing says that your DFS algorithm won't start with exploring all roads around Montréal.
So, if you are trying to build an example of DFS slower than BFS, play with big graphs (the bigger, the more likely it is to occur), with a target not far from start (compared to the width of the graph). And then count on bad luck to find at least one disastrous iteration of DFS. And, of course, don't help DFS with an heuristic (like prioritize edges that diminish the "as the crow flies" distance), or else you need to build examples specifically designed for this heuristic to fail (so for example, having your target west of starting point; having a huge subtree starting west of your start point, that does not includes target, so that the only way to target is to go east first, then turn around the said subtree with a rather direct path. Your DFS will then, even with heuristic, explore the whole subtree, when BFS would have found the target before exploring the whole subtree)
In bold (red or blue) the explored arcs (and nodes). In red, the found path. Arcs are labelled with the order of exploration (so that you can check that, indeed, all by DFS examples are valid DFS)
BFS case: at the end, best path is found, guaranteed. But all path (or almost so) of same length have also been explored. Here best path is length 2. But before we found it we explored all path of length 1, and some other paths of length 2, that is, we explored 6 edges.
Both property (find best path, but explore everything shorter than that) are related. It is because we have explored everything shorter first that we know, when we find a solution, that if there were a shorter solution, we would have seen it before.
In my south-US example, it is because we have explored any point closer to Houston than Tucson that we known, when we reach Tucson, that we have the best path. But it cost us some wandering around Nashville and others.
When we choose to explore beginning with a node B, we continue everything from there, and start to explore siblings only when we have exhausted anything possible from B.
In this example, we were very lucky: we started with B, which leads directly to the best path. So solution length=2 (best), explored edges: 2 also.
This is the typical case for DFS (tho, again, no guaranty that it is the worst case): we weren't lucky enough to find the best path. But, still, we found quickly a path. So solution length=3 (not optimal), but exploration size = 4 edges (so faster than BFS).
This is the case because of which we often talk of a tradeoff: not the best path, but explores less things than BFS.
Again, no guarantee for that. It is just the typical case. And you can make it very probable in real life cases, with an heuristic. Following cases are still possible, but in real life, with an heuristic, they are improbable.
We eventually found the best path (length=2). But it cost us the exploration of a huge impasse, so we had to explore 17 edges. Even more than BFS.
We have not eaten our cake, and yet we don't have it: the path we found eventually is not even optimal, and yet, it cost the huge price of 18 edges exploration (of course, when I say "huge" it is relative to the size of the graph. Real life graph have thousands and millions of edges).
That case is possible. This could be one possible result of a DFS.
Yet in real life, with an heuristic, it is practically impossible. For this to happen with my GPS for Houston-Tucson, you would have to imagine a scenario where you can reach LittleTown a suburb, 5 miles from Tucson, but with no connection at all to Tucson, such as, from LittleTown, the only way to Tucson, is to go back to Houston and then take another parallel highway that lead to Tucson. So from Houston you have 2 roads, one going to Tucson, and another to LittleTown, and that those 2 roads have no connection at all to each other, except in Houston.
That, of course doesn't exist. Reason why, even tho it is possible to build (as I did) examples of DFS taking more time than BFS to find a worse path, it is quite usual to say that tradeoff is "BFS find optimal path, but takes time to do so; DFS works faster, but doesn't find necessarily the best path".