Search code examples
performanceneo4jscalabilitydata-modelingspring-data-neo4j-4

Neo4j data modeling: private owned nodes, rich relationships, locks


Versions used: Neo4j 3.0.6 with Spring-data-neo4j 4.2.0.M1 for POJO mapping

I'm trying to choose how to model data with neo4j and compare benefits/drawbacks of different solutions.

Requirements:

  • A Movie has a dynamic list of metadata (a metadata has 3 properties: 'key', 'value', 'locale'). The number of metadata for a movie is not known in advance, neither are the possible keys. They have to be separable from the other Movie technical properties because they are localized and considered as business data.
  • Metadata are owned by the Movie and always accessed from the Movie. They cannot be shared with other movies
  • Fast fetch queries must be possible on metadata values

Movie metadata example:

Movie metadata
  locale 'en_GB':
    title: 'Jurassic Park'
    description: 'description in english'
  locale 'fr_FR':
    description: 'description en francais'
  locale 'none':
    actor: 'Jeff Goldblum'

enter image description here Solution A

  • One node per metadata (with 3 properties per node: 'key', 'value', 'locale')
  • Drawback: private owned concept to be implemented (delete of Metadata orphan nodes to be managed manually because not supported by spring-data-neo4j/neo4j-ogm)

Solution B

  • One unique node per locale (with 1 property: 'locale') (example: 'en_GB')
  • Metadata as rich relationships (with 2 relationship properties: 'key', 'value')
  • Drawback: to create the relationship, a lock must be taken on Locale node

Does someone has experience about solution B ? How bad is it to need to lock a node that will be shared by million of other nodes ? What is the impact on performances and scalability ?

Does someone has a better modeling solution ?


Solution

  • tl,dr: go with approach A. Don't bother with orphaned :Locale nodes except for periodic cleanup, they will have no effect on query performance.

    Your approach 'A' is by far the better solution. You do need to move that data off of the :Movie node, you are correct, because it will have to be either a nested Map or a list of Maps, neither of which is supported by Node properties. For storage, you could convert these to a Map of lists, but that will be very difficult to query, much less query quickly. Your concern about "orphaned" nodes is insubstantial; it will affect query performance and data size trivially if at all, and is incredibly easy to clean up periodically to ease your mind in any case.

    MATCH (x:Locale) WHERE NOT (x) <- [:METADATA] - () DETACH DELETE x
    

    Do that once a month, or never even, it really won't affect you much. Your query is already constrained by the rest of the path, so unless orphaned :Locale nodes are going to outnumber attached ones substantially, you're only adding a small percentage to what is already likely the largest set in your query, which will also be dropped by query operation on the first pass.

    As for locking, it will only affect write queries anyway, and only while a write transaction is open. You can run a million read-only queries while the write is going on and nothing will be affected. Despite that, the second model is susceptible to slow query performance, because as mentioned above, you can't put indexes on relationship properties.