Social Networking Graph Database and Profile Information

I am new to graph databases and have a question which might be easy to answer for you guys.

If I decide to use a graph database (e.g. Neo4j) for a social networking like application, do I also save profile information and posts in that database or would I need a second database such as MySQL?

All examples which I have found, do only save a few properties in the graph database (e.g. name and relation). Therefore my question.

Many Thanks

Solution

You can store that information directly in the graph database. Not only is it possible but it is even recommended as that information might affect the various traversal queries you would like to run against the database.

At any point in time you may want to create a separate layer for users that interact a lot (via posts). It will be handy to have that information on hand in the graph database.

Same goes for profile information. Maybe you will want to run a traversal only on a specific type of profile (single men, etc.) then you would need to have that information at hand within your graph database.

I would say : unless you have got a specific reason to use MYSQL along side your graph db, you should probably do everything in the graph database.

Some things to keep in mind:

Indexing can be tricky depending on your needs. You usually have to give your data model a little thought instead of jumping into it blindly.
Keep an eye out for the maximum number of elements the graph db of your choice can handle.
Some content such as image blobs etc.. might not be appropriate to store in the db. I've never looked into this though so I could be wrong.

Extra concerns:

So I should create a node for each profile, containing the profile properties and relations to the post nodes?

This is a little tricky, the answer is yes but depending on the number of posts a user has made, that user node (vertex) could potentially become a supernode. A “supernode” is a vertex with a disproportionately high number of incident edges. This can lead to performance issues. To counter this you will want to make sure your graph db can handle these instances properly ; generally by implementing vertex centric indices. I haven't checked in a while but last time I did Neo4j did not support these. OrientDb and Titan (among others) do. Someone can correct me if Neo4j has some support now.

It's going to depend on how you decide to traverse the graph, how many outgoing edges you think you'll have to scan in your traversal etc.. etc.. In general you need to start the process by figuring out what queries you're going to make, and then model the graph accordingly.

What's up with the difference in limitations between Neo4j and OrientDb?

Neo4j limitations seem to be theoretically defined. From the extra comments here it seems like they plan on increasing them shortly. OrientDb and Titan are thought out as being databases for significantly larger graphs hence why their limits are higher at the moment. In reality, even if those limitations are quite far appart, the real question is "are you going to hit them". I've answered yes to this in the past and that is why I use Titan today. But Neo4j's limitations are usually high enough that they can cater to most people's needs.