Search code examples
query-optimizationgraph-databasesarangodbaql

AQL Optimization - traversal by month


In ArangoDB, I have a graph which contains messages for profiles. For reporting purposes, I need to aggregate the number of messages for specific categories of message from each month for the last 12 months.

The date and category of each message are on every edge between the profile and the message.

Currently, I have:

for mo in 0..11
    let month = DATE_MONTH(DATE_SUBTRACT(DATE_NOW(),mo,"month"))
    let msgCount = count(
        for v, e in outbound "profileId" graph 'MyGraph'
            filter e.category == "myCategory" && DATE_MONTH(e.date) == month
            return 1
        )
    return {month: month, msgCount: msgCount}

This query executes in over one second, which is a bit slow for the purposes of this webapp, as I need to create and visualize this report for several categories.

Is there any optimization that could produce the same results faster?


Solution

  • Analysing the query

    Using db._explain() shows the following:

    Execution plan:
     Id   NodeType            Est.   Comment
      1   SingletonNode          1   * ROOT
      2   CalculationNode        1     - LET #7 = 0 .. 11   /* range */   /* simple expression */
      3   EnumerateListNode     12     - FOR mo IN #7   /* list iteration */
      4   CalculationNode       12       - LET month = DATE_MONTH(DATE_SUBTRACT(DATE_NOW(), mo, "month"))   /* v8 expression */
     11   SubqueryNode          12       - LET #4 = ...   /* subquery */
      5   SingletonNode          1         * ROOT
      9   CalculationNode        1           - LET #11 = 1   /* json expression */   /* const assignment */
      6   TraversalNode          1           - FOR v  /* vertex */, e  /* edge */ IN 1..1  /* min..maxPathDepth */ OUTBOUND 'profileId' /* startnode */  GRAPH 'MyGraph'
      7   CalculationNode        1           - LET #9 = ((e.`category` == "myCategory") && (DATE_MONTH(e.`date`) == month))   /* v8 expression */
      8   FilterNode             1           - FILTER #9
     10   ReturnNode             1           - RETURN #11
     13   CalculationNode       12       - LET #13 = { "month" : month, "msgCount" : COUNT(#4) }   /* simple expression */
     14   ReturnNode            12       - RETURN #13
    
    Indexes used:
     none
    
    Traversals on graphs:
     Id   Depth   Vertex collections   Edge collections   Filter conditions
      6   1..1    persons              knows              
    
    Optimization rules applied:
     Id   RuleName
      1   move-calculations-up
      2   remove-unnecessary-calculations
    

    which can tell us the following:

    • in the traversal we use V8 expressions, which is expensive.
    • Since you're executing a complex filter expression, this can't be executed inside of the traverser - thus filtering is applied after the traversal was done.

    What you could do to improve the situation

    You should find a way to move the complex calculations out of the iteration. While working with the DATE_* functions may be convenient, its not fast. You should calculate the strings YEAR:MONTH you want to filter for in advance, and then inside the the query do range comparison on the calculated strings:

    FILTER e.date > '2016:04:' && e.date < '2016:05:'
    

    Nate's implementation:

    for mo in 0..11
        let month = date_subtract(concat(left(date_iso8601(date_now()),7),'-01T00:00:00.000Z'), mo, "month")
        let nextMonth = date_subtract(concat(left(date_iso8601(date_now()),7),'-01T00:00:00.000Z'), mo-1, "month")
        let monthNumber = date_month(month)
        let msgCount = count(
            for v, e in outbound "profileId" graph "MyGraph"
                filter e.category == 'myCategory' && e.date > month && e.date < nextMonth
                return 1
                )
        return {month: monthNumber, msgCount: msgCount}
    

    This significantly speeds up the query (5x faster)!