Search code examples
sqlgraphclojureorientdbnosql

orientDB - graph - updated nodes based on index lookup


I have a graph structure with a root node, several container nodes (I'll call them lvl1) each containing hundreds of thousands of content nodes (lvl2). The content nodes may be linked with arbitrary numbers of other content nodes. Lvl1 nodes will never link to each other and link to their lvl2 nodes exactly once. When the graph gets constructed, links between lvl2 nodes may appear multiple times, in that case I need to keep count of the links (increment a depth property on the appropriate edge). Also the order of construction will be quite random.

I'm looking for an efficient way to manage that graph structure with orientDB. Building it up is eays, the problem is updating lvl2 nodes (adding more links) and links between them.

One way to select could be a standard SQL-query, something like SELECT FROM lvl2nodes WHERE id = 114 - but this would query the whole dataset and be very slow, as far as I can see (I didn't test that yet).

So my idea was to use index lookups. I created automatic indexing CREATE INDEX lvl2node.id UNIQUE and tried to query that: SELECT FROM INDEX:lvl2node.id WHERE key = 114, which gives me a tuple ({:key 114, :rid #<ODocument lvl2node#8:1{id:114,in:[2],out:[1]} v1>}).

Now, how can I

a) use that information to select a node and update its properties and

b) find the edge between 2 such vertices to perform an update similarly

Or is there a better method to update a graphs' vertices, exploiting the graph structure? A lvl1 node's will still contain very many links that would need to be traversed without the hash-map's approach.

I'm using Clojure's clj-orient API to access orientDB.


Solution

  • Like stated in the orientdb wiki :

    [...] "When you've millions of records indexes show their limitation 
    because the cost to find the records is O(logN). This is also the main 
    reason why Relational DBMS are so slow with huge database.
    
    So when you've millions of record the best way to scale up linearly 
    is  avoid using indexes at all or as much as you can. But how to  
    retrieve records in short time without indexes? Should OrientDB scan 
    the entire database at every query? No. You should use the Graph 
    properties of OrientDB."
    

    I would personnaly use linkmaps for the tree like they did with the time series use case.

    It would look like something like this :

    create class lvl1
    create class lvl2
    create class lvl3
    
    create property lvl1.id integer
    create property lvl1.lvl2 linkmap lvl2
    create property lvl2.lvl3 linkmap lvl3
    

    The keys in the linkmap would be the next level node id. To get the depth, you would use the length property of the next level linkmap property.