Search code examples
graphneo4jcypherneo4jphp

Neo4j - family graph design and ancestor/pedigree lookup


I just started playing around with Neo4j, so my apologies if this is a simple concept...

I'm building a relatively large database of family information (a few million nodes with about 5-15 properties per node). As of right now, all data is being stored in a mysql database using Redis as a caching layer, but I'm playing around with switching out Redis for Neo4j to help speed up some of our more expensive queries (and eventually using Neo4j as the main data store instead of mysql).

I'm playing around with storing all my nodes and their properties in Neo4j, and connecting them via HAS_FATHER and HAS_MOTHER relationships. Is this a good approach? Would it be more beneficial to use HAS_PARENT and set a parent_type property on each relationship to either father or mother? Should I also save a reverse relationship called HAS_CHILD on all parents? What are the pros and cons of my options?

Secondly, assuming that I'm using the HAS_FATHER and HAS_MOTHER relationships, what's the optimal query to grab all nodes, properties, and relationships for all direct ancestors (pedigree) 7 generations away? Here's an example query that I'm currently playing with, but I'm new to Cypher and I'm not too familiar with the bottlenecks, optimizations, etc.

MATCH tree = (c)-[:HAS_FATHER|HAS_MOTHER*0..7]->(p)
WHERE c.id = 29421
RETURN nodes(tree), rels(tree)

Any help or tips would be appreciated. Thanks!


Solution

  • Having HAS_MOTHER and HAS_FATHER instead of a HAS_PARENT with a type property is definitely better. In case of more verbose relationships e.g. when you query for mothers your traversals don't need to dig into properties - they can solely rely on relationships.

    The reason for that being more performant is that properties are lazy loaded on demand, see http://neo4j.com/docs/stable/performance-guide.html#_neo4j_primitives_lifecycle.

    If you have semantically inverse relationships you don't have to model them explicitly because if a is mother of b, consequently b is son of a. So for querying children just follow HAS_FATHER and HAS_MOTHER in inverse direction.