Search code examples
graph-databasestitanibm-graph

Is this data model optimal for a basic news feed in TitanDB?


Although I am not using Neo4j, and instead using TitanDB (IBM Graph), due to the fact that I am new to graph databases, I have modelled a basic news feed using the schema suggested in the Neo4j documentation, for now.

http://neo4j.com/docs/snapshot/cypher-cookbook-newsfeed.html

Having fully read all the documentation, I am aware of several key differences between the way these databases operate.

In the model described in the link, each of a users posts are stored as vertexes connected by edges to each other, forming a long list of status updates emanating out from each user vertex.

While this makes sense given Neo4j's capabalities I am aware that TitanDB has vertex-centric indexing abilities, described in detail here:

http://s3.thinkaurelius.com/docs/titan/1.0.0/indexes.html

Right now I am trying to ensure that querying for a given users feed is optimal, for a large graph with lots of users, and with lots of permanently kept posts or status updates. Therefore, I would like to avoid having to traverse all the posts, of all of a users friends, then finally order and limit them, just in order to get the first 15 items of a users feed.

As such, I am unsure if the model described in the Neo4j documentation is really the best one to use with TitanDB, so my question is as follows:

  • Is the model described in the Neo4j documentation optimal for fast news feed retrieval in TitanDB?
  • If so, what indexes would I need to create in order to retrieve a users feed optimally?
  • If not, Would I be better to connect each post vertex directly to the user who posted it, and use a vertex-centric index on the time property of each posted edge?

I'm really after some general advice on modelling, indexing and retrieving a basic newsfeed in Titan DB. Thanks in advance.


Solution

  • The basic schema doesn't seem like a bad approach, though it's difficult to make a good judgement based on this one use case.

    The simplest approach to solving your indexing problem is probably to denormalize a bit - store the user id as a property on the post vertex and create and index on the [user, timestamp] pair.

    Vertex centric indexes might help you, but not in the proposed model - you'd need to model post as an edge, node a vertex, which may make other traversals rather awkward. Furthermore, IBM Graph does not support vertex centric indexes as of its current release.