Search code examples
neo4jcypherpy2neo

Check whether nodes exist in a neo4j graph


NOTE

I let this become several questions instead of the simple one I asked, so I am breaking the follow-ups off into their own question here.

ORIGINAL QUESTION

I'm receiving a list of IDs that I am first testing whether any of them are in my graph, and if they /are/ I am processing those nodes further.

So, for example...

fids = get_fids(record)  # [100001, 100002, 100003, ... etc]
ids_in_my_graph = filter(id_is_in_graph, fids) # [100002]

def id_is_in_graph(id):
    val = False
    query = """MATCH (user:User {{id_str:"{}"}})
    RETURN user
    """.format(id)
    n=neo4j.CypherQuery(graph_db,query).execute_one()
    if n:
        val = True
    return(val)

As you can imagine, doing this with filter, sequentially testing whether each ID is in my graph is really, really slow, and is clearly not properly using neo4j.

How would I rephrase my query such that I could create a list like (User{id_str: [mylist]}) to query and return only IDs that are in my graph?


Solution

  • You may want to use WHERE...IN by exploiting the collection functionality of cypher. Here's the relevant reference

    So your query might look like this:

    MATCH (user:User) 
    WHERE user.id_str IN ["100001", "100002", "100003"]
    return user;
    

    Now, I don't know how large a collection can be. I doubt this would work if your collection had 1,000 items in it. But at least this is a way of batching them up into chunks. This should improve performance.

    Also have a look at the Collections section of the Cypher 2.0 refcard