Search code examples
arangodbaql

ArangoDB Parent to Child edge creation on existing 1 milltion docucments for nested leves not working/or SLOW


Created events document in ArangoDB. Loaded 1 million records as shown below which completes in 40 seconds.

FOR I IN 1..1000000
INSERT {
    "source": "ABC",
    "target": "ABC",
    "type": "REST",
    "attributes" : { "MyAtrib" : TO_STRING(I)},
    "mynum" : I
} INTO events

So record 1 is super parent, and 2 is child of 1 etc. 1 --> 2 --> 3 --> 4 --> ...1000000

Created empty Edge collection ChildEvents, and tried to establish the parent to child edge relations through the below query, but it never completes (created a hash index on mynum, but no luck)

FOR p IN events
   FOR c IN events
        FILTER p.mynum == ( c.mynum + 1 )
            INSERT { _from: p._id, _to: c._id}  INTO ChildEvents

Any help would be greatly appreciated.


Solution

  • Creating the event documents took around 50 seconds on my system. I added an index for mynum to the events collection and ran your second query (with an added RETURN NEW at the end), and it took roughly 70 seconds to process the edges (plus some time to render a subset of them):

    1m edges written in 69 seconds

    I used ArangoDB 3.6.0 with RocksDB engine under Windows 10, Intel i7-6700K 4x4.0 GHz, 32 GB RAM, Samsung Evo 850 SSD.

    Are you sure that the index is set up correctly? Explain the query and check the execution plan, maybe something is different for you?

    Execution plan:
     Id   NodeType                     Est.   Comment
      1   SingletonNode                   1   * ROOT
      3   EnumerateCollectionNode   1000000     - FOR c IN events   /* full collection scan, projections: `mynum`, `_id` */
      9   IndexNode                 1000000       - FOR p IN events   /* persistent index scan, projections: `_id` */
      6   CalculationNode           1000000         - LET #5 = { "_from" : p.`_id`, "_to" : c.`_id` }   /* simple expression */   /* collections used: p : events, c : events */
      7   InsertNode                1000000         - INSERT #5 IN ChildEvents 
      8   ReturnNode                1000000         - RETURN $NEW
    
    Indexes used:
     By   Name                      Type         Collection   Unique   Sparse   Selectivity   Fields        Ranges
      9   idx_1655926293788622848   persistent   events       true     false       100.00 %   [ `mynum` ]   (p.`mynum` == (c.`mynum` + 1))
    
    Optimization rules applied:
     Id   RuleName
      1   move-calculations-up
      2   move-filters-up
      3   interchange-adjacent-enumerations
      4   move-calculations-up-2
      5   move-filters-up-2
      6   remove-data-modification-out-variables
      7   use-indexes
      8   remove-filter-covered-by-index
      9   remove-unnecessary-calculations-2
     10   reduce-extraction-to-projection