Neo4j/Graph (in general) Enumerator Modeling Best Practice

I'm curious what the best way to model enumerators are in Neo4j. Should they be nodes, relationships, properties, etc.?

enum Activity {
    BASKETBALL,  // ID: 1
    HOCKEY // ID: 2
}

For example, in SQL I could just make an enum table and have foreign key relationships (ID: 1, 2) pointing to that lookup table. Should I just have a node for each entry (BASKETBALL, HOCKEY) that would have been in that SQL enum table, or should it be in a label or property? Are there performance impacts by having, say, thousands or millions of nodes thus pointing to that one enum node, or is it more or less not really a concern?

I understand there might be cases for each, and if so, please explain when to use which.

Solution

For this kind of modeling, nodes are the best approximation, with the label being the type, and a property on each for the value.

To model your enum example you might have:

(:Activity{name:'BASKETBALL'})
(:Activity{name:'HOCKEY'})

Then you can have relationships to these nodes as appropriate:

(:Person{name:'Matt'})-[:INTERESTED_IN]->(:Activity{name:'HOCKEY'})

This makes it work well for most kinds of queries (Give me information about Matt including what activities he's interested in; Is Matt interested in hockey? Which people are interested in hockey?)

In a case where you may have thousands or millions of nodes connected to the enum, performance impact really depends upon the direction you're traversing. If a single person only has one (or a few) relationships to :Activity nodes, then a query from persons to activities will be cheap.

However a query from the activity to persons may be more expensive. For example, if your hockey node has millions of connections, this kind of query could be a problem

...
// previously matched (p:Person) to all students at a school
// per student, find who else has a common interest in an activity
MATCH (p)-[:INTERESTED_IN]->()<-[:INTERESTED_IN]-(personWithCommmonInterest)
...

The first traversal in the match is cheap, since persons have few things they are interested in...but the second can be more expensive, as a great many people are interested in the same thing.