I'm new to neo4j and graph databases in general, so this might be a dumb question, but what is the best way to refer to nodes by id, or else what is the best way to form a relation between an existing node and a recently inserted one?
Right now, I have a list of keywords as individual nodes in my graph, and I'm collecting incoming tweets and then forming relations between the user, tweets, and tracked keywords. In order to store the keyword nodes locally by id, I'm using a dictionary, with the keywords as keys, and node ids as values, in order to populate this cypher query:
RELATE_TWEET_TO_KEYWORD = """\
MATCH (a:Tweet), (b:Keyword)
WHERE a.id = {tweet_id} AND id(b) = {keyword_id}
CREATE (a)-[r:REFERENCED]->(b)
RETURN r
"""
The keywords are updated very infrequently, so I simply have a periodic celery task that pickles an updated keyword dictionary every week.
Is there a better or more efficient way to do this? I'm also trying to minimize calls to the server.
Thanks.
You can create a node property uniqueness constraint on a node property to assert the value is unique and treat that as an id. You should not use the Neo4j internal id in external systems as that id can be reclaimed if nodes are deleted.
For example:
CREATE CONSTAINT ON (k:Keyword) ASSERT k.word IS UNIQUE;
You can then treat the word
property as a unique id for Keyword nodes. This also creates an index on the unique property, ensuring that lookups by that property are efficient.