Search code examples
c#sql-serverneo4jneo4jcliententity-framework-6.1

Practical performance comparision of Neo4j and MSSQL for C# developers


Assume we have a web site with a small social graph that people (say ~1M users) can "like" stuff, follow each other, comment on each other posts and ... (the usual scenario).

In .NET for this we have two options:

  1. Using EF (currently 6.1) and MSSQL (v2012 or above) to implement the social graph (the hard way)
  2. Using Neo4j (currently 2.1.4) and Neo4jClient (which as far as I know is the best driver for .NET users)

Given the above scenario and the fact that Neo4j doesn't have a native driver for .NET and the current version of Neo4jClient (1.0.0.657) uses REST api to connect to the database engine, which one would be faster for questions like "Who likes stuff like I do" or "What a person would like (based on the people it follow)" and some other usual question regarding the social graphs?


Solution

  • You haven't specified that much information; your question may be likely to elicit a lot of opinion, but I'll try to give this a fair shake. (Disclaimer: I'm from the neo4j side of this, but I've worked with most of the other things you mention)

    Your question has three elements I want to split apart:

    1. Graph or Relational? (MySQL vs. Neo4J)
    2. Driver/Engineering issues (Neo4jClient/REST vs EF/MySQL)
    3. Modeling practicalities (implementing the social graph "the hard way" vs. in neo4j)

    Graph or Relational?

    You should read another answer I posted about general parameters of the performance of graph databases and graph database query. I won't recap all of that (since it's already on SO) but here's the executive summary: graph databases are very good and fast at path-associative queries where you need to traverse a bunch of edges. Those operations correspond to things in the relational world where you'd join a whole pile of tables together, or where the join depth is variable. In those situations, graph will be better than relational (performance wise). If you want to do bulk scans of users or single joins, you're probably better off with relational (again, see other answer for more detail here). So on this criteria, I am inferring that you only really want to traverse one edge at a time - e.g. "Show me all of the stuff that Bob likes" and that you don't need to do deeper queries like "Show me everyone who is separated by 3-4 degrees from Bob".

    Driver/Engineering Issues

    Speed wise, it's generally known that the java API is faster than the REST API for neo4j. Performance for the REST API would be variable, and depend on a lot of other factors like whether the DB is hosted on the same machine, or how "network far" away it is. You always have extra overhead with REST that comes with things like HTTP and serializing/deserializing JSON that you wouldn't have if you used the java API. So all other things being equal (disclaimer: they never are ;) the REST API will generally be slower than something like EF.

    Modeling Practicalities

    Here, neo4j is going to win by a lot. With MySQL, you'll have the ever-present object-relational impedance mismatch; neo4j lessens (but does not eliminate) those impedance mismatch problems. Modeling wise, neo4j is schemaless, which comes with lots of pros and cons. You can probably cobble together a working model faster with neo4j because your domain is fundamentally graphy-y.