Search code examples
neo4jpy2neo

Best way of keeping track of logs within a relationship in neo4j


I am frankly quite new at using Neo4j. After reading through a lot of the documentation I was wondering what would be the best way of storing visit "logs" with a datatype such as a timestamp? For instance, I have the following relation: [u:User]-(Visited)->[p:Park] Should I create a list attribute for Visited containing multiple timestamps? Or should I make multiple "Visited" Relationships between the two entities containing each a unique timestamp? Generating multiple relationships between the two entities seems like an overhead. I feel as I am missing a key concept in using this type of database. Many thanks,


Solution

  • Or should I make multiple "Visited" Relationships between the two entities containing each a unique timestamp?

    Generating multiple relationships is fine -- graph databases are tailored for this sort of workload, so they are very adept at handling it efficiently. This way, adding and removing new visits is quite simple. For example, if you identify the user and the park with an id, queries like these will work.

    For adding a new visit:

    MATCH (u:User {id: $userId}), (p:Park {id: $parkId})
    CREATE (u)-[:VISITED {timestamp: $timestamp}]->(p)
    

    For deleting a visit:

    MATCH (:User {id: $userId})-[v:VISITED {timestamp: $timestamp}]->(:Park {id: $parkId})
    DELETE v
    

    Querying all timestamps for a user is also easy:

    MATCH (:User {id: $userId})-[v:VISITED]->(:Park {id: $parkId})
    RETURN collect(v.timestamp)
    

    Should I create a list attribute for Visited containing multiple timestamps?

    A list of properties would work on paper, but it would make queries quite cumbersome:

    MATCH (u:User {id: $userId})-[v:VISITED]->(p:Park {id: $parkId})
    SET v.timestamps = coalesce(v.timestamps, []) + [$timestamp]
    

    (The coalesce method returns the first non-null value -- so if the timestamps property is not initialized, it returns an empty list to start with.)

    Of course, this representation makes querying all timestamps even simpler:

    MATCH (u:User {id: $userId})-[v:VISITED]->(p:Park {id: $parkId})
    RETURN coalesce(v.timestamps, [])
    

    However, checking whether a certains user-timestamp-park visit happened gets more difficult and (presumably) a lot slower:

    MATCH (u:User {id: $userId})-[v:VISITED]->(p:Park {id: $parkId})
    WHERE $timestamp IN v.timestamps
    RETURN v
    

    Also, removing a timestamp is no longer trivial:

    MATCH (u:User {id: $userId})-[v:VISITED]->(p:Park {id: $parkId})
    SET v.timestamps = [timestamp IN v.timestamps WHERE timestamp <> $timestamp]
    

    A note on timestamps. There is no timestamp in vanilla Neo4j. Common workarounds include using epoch time or a string with a specific format, e.g. ISO 8601. If you're use case requires to handle timestamps in a more sophisticated way, consider using the conversion methods offered by the APOC library.