Orientdb : 2.1.3
Pyorient : 1.4.7
I need to import a graph with one hundred thousand vertexs and half a million edges into Orientdb by pyorient.
Firstly, I just use db.command("create vertex V set a=1") to insert all the vertexes and edges one by one.
But it takes me about two hours.
So I want to find a way to optimize this process.
Then I find that Orientdb supports Massive Insert, but unfortunately the author of pyorient in an issue massive insertion: no transacations? mentioned that
in the bynary protocol ( and in pyorient of course ) there is not the massive insert intent.
Pyorient supports sql batch. Maybe this is an opportunity!
I just put all the insert commands together and run it by db.batch().
I take a graph with 5000 vertexes and 20000 edges for example
sql batch
vertexs : 25.1708816278 s
edges : 254.248636227 s
original
constrct vertexs : 19.5094766904 s
construct edges : 147.627924276 s
..it seems that sql batch costs much more time.
So I want to know whether there is a way to do it.
Thanks.
When you make the one by one entry, you've already tried to see if you get better performance using Transactional Graph and commit every X items ?? Usually this is the correct way to insert a lot of data. Unfortunately using pyorient, as you also indicated you, the Massive Insert you can not use it and also Multi-process approaches are unable to exploit (the socket connection is only one and all your concurrent objects will be serialized ( as for a pipeline ) because a connection pool is not implemented in the driver. So you can loose the performance advantages of the multiprocessing).