python performance insert gremlin amazon-neptune

Neptune slow to insert data

I'm trying to load data with a script in python where I create 26000 vertex and related relationship. Using gremlin-python, the script is like

g.V().has('dog', 'name', 'pluto').fold().\
    coalesce(__.unfold(), __.addV('dog').property('name', 'pluto')).store('dog').\
    V().has('person', 'name', 'sam').fold().\
    coalesce(__.unfold(), __.addV('person').property('name', 'sam')).store('person').\
    select('person').unfold().\
    coalesce(__.outE('has_dog').where(__.inV().where(eq('dog')).by(T.id).by(__.unfold().id())),
             __.addE('has_dog').to(__.select('person').unfold())).toList()

In the same transaction I can append up to 50 vertex and edges. If I execute this script in my PC (i7 with 16GB RAM) I take 4/5 minutes but using a Neptune instance with 8CPU and 32GB RAM, after 20 minutes, the script is only at 10% of execution. How is it possible that Neptune is so slower?

Thanks.

Solution

Each Neptune instance that you connect to has a pool of worker threads. That pool will be two times the number of vCPU on the instance. If you send the queries in a single threaded fashion you are only taking advantage of one worker thread. You can substantially increase the throughput rates by dividing the work across multiple tasks in your application. I often use the multithreading library but even using basic Python threads will likely help as these are IO bound tasks and so the threads will likely yield. I have added millions of vertices and edges using Python in this way. Without doing something like this you are not taking full advantage of the available resources on the instance. If you have the work already divided up into batches of 50, you can spread those batches across multiple threads. Matching up the number of client threads/tasks with two times the number of vCPU on the Neptune instance is a good place to start.

Ideally the threads will touch different parts of the graph to avoid concurrently trying to modify the same vertices and edges from concurrent threads.