Search code examples
graphneo4jtime-complexityrelational-databasesql-graph

Perfomance SQL Server 2017 Graph vs Neo4j


I am researching about graph databases. I stumbled into SQL Server 2017 and learned that they added the option to use a graph database. But I have some uncertainties about the performance. I watched several Youtube videos, tutorials and papers about this SQL Server 2017 Graph. For example this page.

Image, lookup in SQL-server

With the image above in mind. When I try to find a node, is it true that the time complexity is O(n)? And is the performance in other graph databases like Neo4j similar? I am only talking about node lookup and not shortest path algorithms etc.

I also have a feeling that the graph functionality in SQL Server is just a relational database in disguise. Is this correct?

Thanks in advance.


Solution

  • There is a big difference between a graph database and a relational database with graph capabilities, in the sense of how the data is stored.

    To summarise simply, when a triple ( aka 2 nodes connected by a relationship ) is stored, the underlying database difference will be :

    • Neo4j, the triple is stored as a graph on disk, nodes have pointers to the relationships they have, so during retrieval it will just be pointer chasing from nodes
    • SQL like : one node is stored in one table, the other node is stored in another table, yet you can query as a graph but the operation will be really making a JOIN

    Based on those two facts, we can say that in native graphs the join is performed at write time compared to having joins at query time in non-native graphs.

    Be very careful when you hear distributed graphs, partitions, planet scale and the like. If you start having relationships that have to be traversed over the network you will always suffer performance issues. Most of the distributed graphs platforms note also that for maximum performance you have to store everything on one partition (which defeats the partitioning purpose).