Search code examples
databasegraphanalyticsarchiving

Graph database functionality


Recently I came to know about graph databases.I read that these databases have limited analytics. I read that here http://www.readwriteweb.com/enterprise/2009/02/is-the-relational-database-doomedp2.php "Things like tracking usage patterns and providing recommendations based on user histories may be difficult at best, and impossible at worst, with this type of database platform."

1 I am not able to understand why is this analytics is limited here?

2 How these database graph can be used for archiving for example facebook which saves all the posts by millions of users. How this can be done in graph database?


Solution

  • if you apply a strict Property Graph Model, you will find that you have great "data local" operations, like exploring the surrounding data of a node say, 5 hops deep along the relationships. However, global operations like "give me all nodes that have a name attribute of value 'Tom*'" require in a graph model a full scan of the data. This is in theory a limit. In practice (like in http://neo4j.org) the graph engine is combined with global indecies like Lucene, BerkelyDB or Cassandra, that can take care of this kind of data-global aspects that are often used in certain analytics scenarios.

    So, there is no real limitation, just a different way and different patterns to deal with global and local operations of your data. See http://wiki.neo4j.org/content/Domain_Modeling_Gallery for some examples of modeling domains in graphs, or even GIS examples like https://github.com/neo4j/neo4j-spatial/raw/master/src/site/pics/one-street.png on the Open Street Map graph.

    For archiving vast amounts of data like Facebook, I would only store say the last month or so of status updates for fast retrieval and recommendations in a graph. The rest I would archive in solutions like Cassandra and just have a reference and key metrics of the archive in the graph on how to retrieve this "archived subgraph" if needed.