In ArangoDB, I have a graph which contains messages for profiles. For reporting purposes, I need to aggregate the number of messages for specific categories of message from each month for the last 12 months.
The date and category of each message are on every edge between the profile and the message.
Currently, I have:
for mo in 0..11
let month = DATE_MONTH(DATE_SUBTRACT(DATE_NOW(),mo,"month"))
let msgCount = count(
for v, e in outbound "profileId" graph 'MyGraph'
filter e.category == "myCategory" && DATE_MONTH(e.date) == month
return 1
)
return {month: month, msgCount: msgCount}
This query executes in over one second, which is a bit slow for the purposes of this webapp, as I need to create and visualize this report for several categories.
Is there any optimization that could produce the same results faster?
Using db._explain()
shows the following:
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
2 CalculationNode 1 - LET #7 = 0 .. 11 /* range */ /* simple expression */
3 EnumerateListNode 12 - FOR mo IN #7 /* list iteration */
4 CalculationNode 12 - LET month = DATE_MONTH(DATE_SUBTRACT(DATE_NOW(), mo, "month")) /* v8 expression */
11 SubqueryNode 12 - LET #4 = ... /* subquery */
5 SingletonNode 1 * ROOT
9 CalculationNode 1 - LET #11 = 1 /* json expression */ /* const assignment */
6 TraversalNode 1 - FOR v /* vertex */, e /* edge */ IN 1..1 /* min..maxPathDepth */ OUTBOUND 'profileId' /* startnode */ GRAPH 'MyGraph'
7 CalculationNode 1 - LET #9 = ((e.`category` == "myCategory") && (DATE_MONTH(e.`date`) == month)) /* v8 expression */
8 FilterNode 1 - FILTER #9
10 ReturnNode 1 - RETURN #11
13 CalculationNode 12 - LET #13 = { "month" : month, "msgCount" : COUNT(#4) } /* simple expression */
14 ReturnNode 12 - RETURN #13
Indexes used:
none
Traversals on graphs:
Id Depth Vertex collections Edge collections Filter conditions
6 1..1 persons knows
Optimization rules applied:
Id RuleName
1 move-calculations-up
2 remove-unnecessary-calculations
which can tell us the following:
You should find a way to move the complex calculations out of the iteration.
While working with the DATE_*
functions may be convenient, its not fast. You should calculate the strings YEAR:MONTH you want to filter for in advance, and then inside the the query do range comparison on the calculated strings:
FILTER e.date > '2016:04:' && e.date < '2016:05:'
for mo in 0..11
let month = date_subtract(concat(left(date_iso8601(date_now()),7),'-01T00:00:00.000Z'), mo, "month")
let nextMonth = date_subtract(concat(left(date_iso8601(date_now()),7),'-01T00:00:00.000Z'), mo-1, "month")
let monthNumber = date_month(month)
let msgCount = count(
for v, e in outbound "profileId" graph "MyGraph"
filter e.category == 'myCategory' && e.date > month && e.date < nextMonth
return 1
)
return {month: monthNumber, msgCount: msgCount}
This significantly speeds up the query (5x faster)!