Versions used: Neo4j 3.0.6 with Spring-data-neo4j 4.2.0.M1 for POJO mapping
I'm trying to choose how to model data with neo4j and compare benefits/drawbacks of different solutions.
Requirements:
Movie metadata example:
Movie metadata
locale 'en_GB':
title: 'Jurassic Park'
description: 'description in english'
locale 'fr_FR':
description: 'description en francais'
locale 'none':
actor: 'Jeff Goldblum'
Solution B
Does someone has experience about solution B ? How bad is it to need to lock a node that will be shared by million of other nodes ? What is the impact on performances and scalability ?
Does someone has a better modeling solution ?
tl,dr: go with approach A. Don't bother with orphaned :Locale
nodes except for periodic cleanup, they will have no effect on query performance.
Your approach 'A' is by far the better solution. You do need to move that data off of the :Movie
node, you are correct, because it will have to be either a nested Map or a list of Maps, neither of which is supported by Node properties. For storage, you could convert these to a Map of lists, but that will be very difficult to query, much less query quickly. Your concern about "orphaned" nodes is insubstantial; it will affect query performance and data size trivially if at all, and is incredibly easy to clean up periodically to ease your mind in any case.
MATCH (x:Locale) WHERE NOT (x) <- [:METADATA] - () DETACH DELETE x
Do that once a month, or never even, it really won't affect you much. Your query is already constrained by the rest of the path, so unless orphaned :Locale
nodes are going to outnumber attached ones substantially, you're only adding a small percentage to what is already likely the largest set in your query, which will also be dropped by query operation on the first pass.
As for locking, it will only affect write queries anyway, and only while a write transaction is open. You can run a million read-only queries while the write is going on and nothing will be affected. Despite that, the second model is susceptible to slow query performance, because as mentioned above, you can't put indexes on relationship properties.