Search code examples
databasesocial-networking

Representation of Database as a Network


I'm working on a project of mine which requires searching with respect to a person's position in a network. Basically I need to define a database which contains some users with different connections.

As like a network, the users are the node and the connections are the edges. Suppose friendship is a type of edge. There may be some other types of connections. The connections might have some sort of weights/priority.

Now, when the user searches for another user the results should be in a order such that the top results will be the nearest nodes and the nodes who are furthest will be at last depending on the connections and their weights/priority.

I cant find where should start on this? I don't need any ready made code for this, but I need to learn. So, please suggest tutorials or articles if you know on this. You can also suggest me code where I can learn on this.

Another Question: Can MySQL database be used to represent this type of network or I need some special database?


Solution

  • If you are free to choose the database engine you want, graph databases are the best solution. You can find a quite complete list in Wikipedia here: http://en.wikipedia.org/wiki/Graph_database

    Take a look also at this video: http://www.youtube.com/watch?v=UodTzseLh04

    I had good experience with Neo4J (http://www.neo4j.org/). It is written in Java but provides bindings for a lot of languages (in JMV languages can run in embedded mode). You can use it also via REST interface. The language to query the graph is Cypher (http://docs.neo4j.org/chunked/milestone/cypher-query-lang.html) that is not so different from SQL.

    However one key point to consider in your evaluation is the size of your graph. The ability to model so complex data has as main drawback the difficulty to scale efficiently on multiple machines (partition a graph is an NP-hard problem). Neo4J can handle an huge quantity of nodes on a single machine, but if you need a very massive graph I suggest you to try Titan (http://thinkaurelius.github.com/titan/).

    More info about Titan: http://www.slideshare.net/slidarko/titan-the-rise-of-big-graph-data

    And if in the future you need heavy processing: http://thinkaurelius.github.com/faunus/