Search code examples
azure-cosmosdbgraph-databases

Using the Graph API in Cosmos DB is VERY slow compared to the (documentdb)SQL API


Given a CosmosDB set up with the GraphAPI. A graph with ~4k vertices and ~10k edges, a similar query from the GraphAPI and DocumentAPI to the same database show significantly different run times. I've been testing the difference between the APIs using the following node application:

var Gremlin = require('gremlin');
var config = require("./config");
var documentdb = require('documentdb');

const docClient = new documentdb.DocumentClient(....);
const graphClient = Gremlin.createClient(....);


const start = new Date();
graphClient.execute('g.V("12345")', {}, (err, results) => {
    const end = new Date();
    if (err) {
        return console.error(err);
    }

    console.log(`GraphDB API Results in: ${(end.getTime() - start.getTime()) / 1000}`);
});

var querySpec = {
    query: 'SELECT * FROM c ' +
           'WHERE c.id = "12345"',

};
const docStart = new Date();
docClient.queryDocuments("dbs/graphdb/colls/sn", querySpec).toArray((err, results) => {
    const docEnd = new Date();
    if (err) {
        console.error(JSON.stringify(err, null, 2));
        return;
    }

    console.log(`DocumentDB API Results in: ${(docEnd.getTime() - docStart.getTime()) / 1000}`)
});

The output of this code shows the single document being queried for is returned by the GraphAPI in ~1.8 seconds, where as the document is returned from the documentdb api in ~0.3 seconds.

DocumentDB API Result:

[
  {
    "label": "company",
    "id": "12345",
    "parent": [
      {
        "_value": "54321",
        "id": "de7c87f7-83db-43c2-8ddd-c5487dd5682e"
      }
    ],
    "name": [
      {
        "_value": "Acme Co",
        "id": "b4316415-d5c3-4dcc-ac5f-64b1d8c8bd62"
      }
    ],
    "_rid": "KPk3APUeEgFcAAAAAAAAAA==",
    "_self": "dbs/KPk3AA==/colls/KPk3APUeEgE=/docs/KPk3APUeEgFcAAAAAAAAAA==/",
    "_etag": "\"0000df07-0000-0000-0000-5a2b23bd0000\"",
    "_attachments": "attachments/",
    "_ts": 1512776637
  }
]

GraphDB API Result:

[
  {
    "id": "12345",
    "label": "company",
    "type": "vertex",
    "properties": {
      "parent": [
        {
          "id": "de7c87f7-83db-43c2-8ddd-c5487dd5682e",
          "value": "54321"
        }
      ],
      "name": [
        {
          "id": "b4316415-d5c3-4dcc-ac5f-64b1d8c8bd62",
          "value": "Acme Co"
        }
      ]
    }
  }
]

All of these examples are on a fixed size collection with the RU's turned all the way to 10,000.

Am I doing something wrong here? Do I need to make better/more/fewer indices? It seems crazy that a cloud scale database like Cosmos can't return a single document in less than a second regardless of the query structure.

I have examples of simple traversals (g.V().hasLabel('x').out('y').hasLabel('z')) that take over 5 seconds to return when hasLabel('x') count is ~40. If hasLabel('x') count is ~1000 the traversal takes over 15 seconds to return. This seems very slow to me.

I've looked around for any performance numbers, but haven't found any examples. At the end of the day am I just expecting too much from this technology?


Solution

  • Thanks to MS for figuring out the issue. There was some problem with their roll out of the gremlin API endpoints. My instance was calling to a gremlin endpoint in a different region (if I'm understanding the message from MS correctly) from my database instance which was causing issues.

    I was given a feature flag to set on the portal to force deployment of a new database on their new infrastructure.

    I'm now seeing sub 500ms response times all of my queries and traversals.