Search code examples
gremlinamazon-neptune

Multi-tenancy in AWS Neptune database


I'm new to Neptune. What is the best way to support multi-tenancy in the Neptune database?
The requirements:
1. Support thousands of tenants in the database (one cluster)
2. Avoid query getting too complicated with tenants filtering
3. Good performance (if there is a way to use the data partitioning for faster query time)
4. Secure - make it hard to make mistakes which will cause cross tenants access.


Solution

  • In a non-production environment the Gremlin partition strategy proved sufficient for me. The vertices and edges co-exist in the same Gremlin cluster, they have a property that differentiates them, in my case I used a _env property.

    Then in my Java code each time I request a traversal from my factory, it uses the partition strategy.

        private GraphTraversalSource buildReadOnlyTraversal() {
            log.debug("building read-only traversal");
            return AnonymousTraversalSource.traversal()
                    .withRemote(DriverRemoteConnection.using(getReadOnlyCluster()))
                    .withStrategies(buildPartitionStrategy(), buildReadOnlyStrategy());
        }
    
        private PartitionStrategy buildPartitionStrategy() {
            var env = this.properties.getEnvironmentPartition();
            log.info("building partition strategy for environment={}", env);
    
            return PartitionStrategy.build()
                    .partitionKey(ENVIRONMENT_PARTITION_KEY)
                    .writePartition(env)
                    .readPartitions(env)
                    .create();
        }
    

    Using these traversal's will automatically be scoped to your partition. However the big gotcha is that you'll need to remember to manually add references to the partition when querying from the console (well actually anything that isn't using the partition strategy mechanism) e.g.

    g.V().hasLabel('user').has('_env', 'dev')

    I think this meets the first 2 of your criteria, performance I can't really comment on. Point 4, yeah its not been a problem from application code, errors more likey when manually tinkering with the graph.