I discovered the following Gremlin query that was charging 60K RUs in Cosmos DB:
g.addV(label, 'plannedmeal')
.property('partitionKey', '84ca17dd-c284-4f47-a839-a75bc27f9097')
.as('meal')
.V('19760224-7ac1-4316-b9a8-1f7a979274b8') <--- problem
.as('food')
.select('meal')
.addE('contains')
.to('food')
.select('meal')
Through process of elimination, I learned that the .V('19760224-7ac1-4316-b9a8-1f7a979274b8')
is the expensive part. I can easily split the query into 2, such as:
g.addV(label, 'plannedmeal')
.property('partitionKey', '84ca17dd-c284-4f47-a839-a75bc27f9097')
g.V('ID_OF_NEW_ITEM')
.as('meal')
.V('19760224-7ac1-4316-b9a8-1f7a979274b8')
.as('food')
.select('meal')
.addE('contains')
.to('food')
.select('meal')
For reference, this costs about 50 RUs total. My question is - why is there a 59,950 RU difference between these 2 approaches?
Edit After reviewing the execution profile of the query, the GetVerticies operation that occurs in the problematic step seems to scan every vertex in my graph. This is the problem, but its still not clear why requesting a V by an id is so expensive.
This is caused by a known limitation of Cosmos.
Index utilization for Gremlin queries with mid-traversal .V() steps: Currently, only the first .V() call of a traversal will make use of the index to resolve any filters or predicates attached to it. Subsequent calls will not consult the index, which might increase the latency and cost of the query.
I adjusted the query to use one of the workarounds suggested by the documentation and it dropped to 23 RU.
g.addV(label, 'plannedmeal')
.property('partitionKey', '84ca17dd-c284-4f47-a839-a75bc27f9097')
.as('meal')
.map(
__.V('19760224-7ac1-4316-b9a8-1f7a979274b8')
)
.as('food')
.select('meal')
.addE('contains')
.to('food')
.select('meal')