Search code examples
neo4jgraph-databases

Neo4j/Graph (in general) Enumerator Modeling Best Practice


I'm curious what the best way to model enumerators are in Neo4j. Should they be nodes, relationships, properties, etc.?

enum Activity {
    BASKETBALL,  // ID: 1
    HOCKEY // ID: 2
}

For example, in SQL I could just make an enum table and have foreign key relationships (ID: 1, 2) pointing to that lookup table. Should I just have a node for each entry (BASKETBALL, HOCKEY) that would have been in that SQL enum table, or should it be in a label or property? Are there performance impacts by having, say, thousands or millions of nodes thus pointing to that one enum node, or is it more or less not really a concern?

I understand there might be cases for each, and if so, please explain when to use which.


Solution

  • For this kind of modeling, nodes are the best approximation, with the label being the type, and a property on each for the value.

    To model your enum example you might have:

    (:Activity{name:'BASKETBALL'})
    (:Activity{name:'HOCKEY'})
    

    Then you can have relationships to these nodes as appropriate:

    (:Person{name:'Matt'})-[:INTERESTED_IN]->(:Activity{name:'HOCKEY'})
    

    This makes it work well for most kinds of queries (Give me information about Matt including what activities he's interested in; Is Matt interested in hockey? Which people are interested in hockey?)

    In a case where you may have thousands or millions of nodes connected to the enum, performance impact really depends upon the direction you're traversing. If a single person only has one (or a few) relationships to :Activity nodes, then a query from persons to activities will be cheap.

    However a query from the activity to persons may be more expensive. For example, if your hockey node has millions of connections, this kind of query could be a problem

    ...
    // previously matched (p:Person) to all students at a school
    // per student, find who else has a common interest in an activity
    MATCH (p)-[:INTERESTED_IN]->()<-[:INTERESTED_IN]-(personWithCommmonInterest)
    ...
    

    The first traversal in the match is cheap, since persons have few things they are interested in...but the second can be more expensive, as a great many people are interested in the same thing.