python algorithm social-networking traversal breadth-first-search

Python usage of breadth-first search on social graph

I've been reading a lot of stackoverflow questions about how to use the breadth-first search, dfs, A*, etc, the question is what is the optimal usage and how to implement it in reality verse simulated graphs. E.g.

Consider you have a social graph of Twitter/Facebook/Some social networking site, to me it seems a search algorithm would work as follows:

If user A had 10 friends, then one of those had 2 friends and another 3. The search would first figure out who user A's friends were, then it would have to look up who the friends where to each of the ten users. To me this seems like bfs?

However, I'm not sure if that's the way to go about implementing the algorithm.

Thanks,

Solution

For my two cents, if you're just trying to traverse the whole graph it doesn't matter a whole lot what algorithm you use so long as it only hits each node once. This seems to be what you're saying when you note:

I'm just trying to traverse the whole graph

This means your terminology is technically flawed- you're talking about walking a graph, not searching a graph. Unless you're actually trying to search for something in particular, which you don't seem to mention in the question at all.

With that said, Facebook and Twitter are very different graph structures that do have an impact on how you walk them:

Facebook is fundamentally an undirected graph. If X is friends with Y, Y MUST be friends with X. (Or in a relationship with, or related to, etc).
Twitter is fundamentally a directed graph. If you X follows Y, Y does not have to follow X.

These issues will significantly impact the graph walking algorithm. To be quite honest, if you just want to visit all the nodes, do you even need a graph? Why not just iterate over all of them? If you have all the nodes in some data structure MY_DATA that is iterable, you could just have a generator expression like this:

def nodeGenerator(MY_DATA)
    for node in MY_DATA:
        yield node

Clearly, you'd need to adjust the nodeGenerator internals to handle how you're actually accessing the nodes. With that said, most graph structures implement a node iterator. Then you can just create an iterator anytime you want to do things via:

 for node in nodeGenerator(MY_DATA):
     (Do something here)

Maybe I'm just missing the point of the question here, but at present you've posed a question about search algorithms without a search problem. Due to the No Free Lunch nature of optimization and search, the worth of any search algorithm will be entirely dependent on the search problem you're trying to examine.

This is true even among the same data set. After all, if you were searching for everybody whose name starts with the letter D, a great approach would be to just sort everyone alphabetically and do a binary search. If instead you're trying to find everyone's degree of separation from Kevin Bacon, you're going to want and algorithm that starts with Mr. Bacon and recursively iterates over everyone who knows him and everyone who they know. These are both things you COULD do on Facebook or Twitter, but without any specifics there's really no way to recommend an algorithm. Hence, if you know nothing, just iterate over everyone as a list. It's just as good as anything else. If you then want to optimize, cache any calculations.